Our explanation:
Over the past few weeks, some of you may have experienced brief service interruptions across parts of our network. We want to be upfront about what happened, what we've learned, and most importantly what we've done about it.
Our edge routers have always been built with internal redundancy in mind - redundant supervisors, redundant power supplies, multiple line cards, and redundant fabric modules. That level of hardware resilience handles the vast majority of failure scenarios well.
However, the recent outages exposed a gap: when an issue affects the chassis itself such as a software defect, a firmware upgrade that requires a full reload, or cases like where a software process on a router crashes (as has happened recently in London -> Twice) - there was no second device to immediately absorb the traffic. The router was redundant in every way except the one that mattered in these incidents.
What we're doing:
We're rolling out a dual router design across all six of our points of presence — Dallas, Ashburn, Los Angeles, London, Amsterdam, and Frankfurt. Once complete, every PoP will operate with two independent edge routers in an active/active configuration, with full BGP session redundancy to all upstream and peering partners. If an entire chassis needs to be taken offline for maintenance, a software upgrade, or an unexpected failure then traffic will automatically reconverge on the second device with no customer-facing impact.
Each router in the pair will run on independent power feeds with independent management and control planes. We're also using this as an opportunity to standardize failover testing procedures across all PoPs, so this architecture is validated continuously, not just at deployment. This also provides protection against cases where a configuration change (with possibly human error involved) leads to a change which ends up knocking out a bunch of traffic. The investments for these changes were made during the last couple of weeks, so were already in the works and unrelated to incidents in March, but with summer right around the corner we wanted to let you know you'll be in good hands.
These changes will also allow for things like DDoS mitigation changes to be performed in a more controlled rollout (e.g. to parts of traffic only), zero-downtime maintenance windows for core networking equipment and a stronger foundation for the capacity expansions we have planned for the rest of 2026.