Cosmic Global - Notice history

100% - uptime

Los Angeles, California - Operational

100% - uptime
Jan 2026 · 99.59%Feb · 100.0%Mar · 100.0%
Jan 2026
Feb 2026
Mar 2026

Dallas, Texas - Operational

100% - uptime
Jan 2026 · 100.0%Feb · 100.0%Mar · 99.77%
Jan 2026
Feb 2026
Mar 2026

Ashburn, Virginia - Operational

100% - uptime
Jan 2026 · 99.59%Feb · 100.0%Mar · 100.0%
Jan 2026
Feb 2026
Mar 2026

London, United Kingdom - Operational

100% - uptime
Jan 2026 · 100.0%Feb · 100.0%Mar · 99.77%
Jan 2026
Feb 2026
Mar 2026

Amsterdam, Netherlands - Operational

100% - uptime
Jan 2026 · 100.0%Feb · 100.0%Mar · 100.0%
Jan 2026
Feb 2026
Mar 2026

Frankfurt, Germany - Operational

100% - uptime
Jan 2026 · 100.0%Feb · 100.0%Mar · 100.0%
Jan 2026
Feb 2026
Mar 2026
100% - uptime

Transit - Operational

100% - uptime
Jan 2026 · 100.0%Feb · 100.0%Mar · 100.0%
Jan 2026
Feb 2026
Mar 2026

Proxies - Operational

100% - uptime
Jan 2026 · 100.0%Feb · 100.0%Mar · 100.0%
Jan 2026
Feb 2026
Mar 2026

API - Operational

100% - uptime
Jan 2026 · 100.0%Feb · 100.0%Mar · 100.0%
Jan 2026
Feb 2026
Mar 2026

Customer Portal - Operational

100% - uptime
Jan 2026 · 100.0%Feb · 100.0%Mar · 100.0%
Jan 2026
Feb 2026
Mar 2026

Notice history

Mar 2026

DFW Outage
  • Postmortem
    Postmortem

    Our explanation:

    Over the past few weeks, some of you may have experienced brief service interruptions across parts of our network. We want to be upfront about what happened, what we've learned, and most importantly what we've done about it.

    Our edge routers have always been built with internal redundancy in mind - redundant supervisors, redundant power supplies, multiple line cards, and redundant fabric modules. That level of hardware resilience handles the vast majority of failure scenarios well.

    However, the recent outages exposed a gap: when an issue affects the chassis itself such as a software defect, a firmware upgrade that requires a full reload, or cases like where a software process on a router crashes (as has happened recently in London -> Twice) - there was no second device to immediately absorb the traffic. The router was redundant in every way except the one that mattered in these incidents.

    What we're doing:

    We're rolling out a dual router design across all six of our points of presence — Dallas, Ashburn, Los Angeles, London, Amsterdam, and Frankfurt. Once complete, every PoP will operate with two independent edge routers in an active/active configuration, with full BGP session redundancy to all upstream and peering partners. If an entire chassis needs to be taken offline for maintenance, a software upgrade, or an unexpected failure then traffic will automatically reconverge on the second device with no customer-facing impact.

    Each router in the pair will run on independent power feeds with independent management and control planes. We're also using this as an opportunity to standardize failover testing procedures across all PoPs, so this architecture is validated continuously, not just at deployment. This also provides protection against cases where a configuration change (with possibly human error involved) leads to a change which ends up knocking out a bunch of traffic. The investments for these changes were made during the last couple of weeks, so were already in the works and unrelated to incidents in March, but with summer right around the corner we wanted to let you know you'll be in good hands.

    These changes will also allow for things like DDoS mitigation changes to be performed in a more controlled rollout (e.g. to parts of traffic only), zero-downtime maintenance windows for core networking equipment and a stronger foundation for the capacity expansions we have planned for the rest of 2026.

  • Resolved
    Resolved
    This incident has been resolved.
  • Monitoring
    Monitoring

    The incident was resolved shortly after onset, and we have been monitoring since. A full RFO will be posted when this status is closed. In short, the root cause was a cascading failure triggered by a bug in the routing software itself — not by any action taken by our team. The issue was entirely outside our control, and we responded quickly to restore normal operation.

  • Identified
    Identified

    We identified the root cause, and have been working on resolution. Recovery efforts are showing progress as traffic is beginning to restore in Dallas.

  • Investigating
    Investigating

    We are currently investigating this incident.

London Connectivity Issues
  • Resolved
    Resolved

    Following emergency maintenance yesterday that required a reboot of a core router in our London facility, an Arista runtime software bug caused the router's ARP entries to gradually decay from active memory.

    Although the router's configuration remained correct throughout, the hardware chip (ASIC) responsible for directing network traffic failed to correctly reload the address mappings after the reboot. These mappings are what tell the router how to reach a set of internal endpoints used for multicast traffic forwarding. With them missing from the hardware's active memory, traffic that should have been flowing through those paths was silently dropped.

    Because the configuration itself was never corrupted, the root cause was not immediately obvious. A number of other potential causes were investigated before the true issue was identified — a desync between the router's stored configuration and what the hardware had actually loaded into memory.

    We sincerely apologize for the impact this had on your services and for the time it took to identify the root cause. We understand how frustrating extended investigations can be, and we appreciate your patience while our engineers worked methodically through the contributing factors to reach a definitive resolution.

  • Monitoring
    Monitoring

    We have implemented another round of fixes and connectivity is recovering. Please reach out to us if you are still having issue while we continue to monitor.

    Thank you again for your patience in this matter. We will provide a full report when we confirm all is well.

  • Update
    Update

    We are continuing to investigate TCP issues in the London PoP. We apologize for the continued problems today and are making progress toward a full resolution for this location.

  • Identified
    Identified

    We are continuing to monitor reports of elevated issues and are still working towards a permanent resolution.

  • Monitoring
    Monitoring

    We have rolled out a batch of fixes and are seeing connectivity recover. We are continuing to monitor the situation closely.

  • Identified
    Identified

    We have identified an issue with our filtering software in our London PoP and are working on a resolution as quickly as possible.

Feb 2026

Upstream Issues
  • Resolved
    Resolved
  • Investigating
    Investigating

    We are currently investigating this incident. We are seeing issues with one of our upstreams losing announcement of prefixes temporarily. We are working with them in an emergency fashion. You may see blips as traffic fails over and back at times.

Jan 2026 to Mar 2026

Next