Cosmic Global - London Connectivity Issues – Incident details

London Connectivity Issues

Resolved
Major outage
Started 4 days agoLasted about 8 hours

Affected

Network Points of Presence

Major outage from 9:00 AM to 10:40 AM, Degraded performance from 10:40 AM to 4:46 PM

London, United Kingdom

Major outage from 9:00 AM to 10:40 AM, Degraded performance from 10:40 AM to 4:46 PM

Updates
  • Resolved
    Resolved

    Following emergency maintenance yesterday that required a reboot of a core router in our London facility, an Arista runtime software bug caused the router's ARP entries to gradually decay from active memory.

    Although the router's configuration remained correct throughout, the hardware chip (ASIC) responsible for directing network traffic failed to correctly reload the address mappings after the reboot. These mappings are what tell the router how to reach a set of internal endpoints used for multicast traffic forwarding. With them missing from the hardware's active memory, traffic that should have been flowing through those paths was silently dropped.

    Because the configuration itself was never corrupted, the root cause was not immediately obvious. A number of other potential causes were investigated before the true issue was identified — a desync between the router's stored configuration and what the hardware had actually loaded into memory.

    We sincerely apologize for the impact this had on your services and for the time it took to identify the root cause. We understand how frustrating extended investigations can be, and we appreciate your patience while our engineers worked methodically through the contributing factors to reach a definitive resolution.

  • Monitoring
    Monitoring

    We have implemented another round of fixes and connectivity is recovering. Please reach out to us if you are still having issue while we continue to monitor.

    Thank you again for your patience in this matter. We will provide a full report when we confirm all is well.

  • Update
    Update

    We are continuing to investigate TCP issues in the London PoP. We apologize for the continued problems today and are making progress toward a full resolution for this location.

  • Identified
    Identified

    We are continuing to monitor reports of elevated issues and are still working towards a permanent resolution.

  • Monitoring
    Monitoring

    We have rolled out a batch of fixes and are seeing connectivity recover. We are continuing to monitor the situation closely.

  • Identified
    Identified

    We have identified an issue with our filtering software in our London PoP and are working on a resolution as quickly as possible.