Update: Incident report at https://wikitech.wikimedia.org/wiki/Incident_documentation/20180410-Routing
For about 54min from 22:46 - 23:40 on Tue 10 Apr, a significant amount of global traffic was unable to reach our data centres.
Varnish traffic (Grafana) |
---|
From 22:53 - 23:03 (10min), "Transmission cp10xx (eqiad)" was down 70% (dropped from 10 GBit/s to 3 GBit/s). The bottom lasted for about 5min (22:55 - 23:00).
From 22:46 - 23:24 (40min), "Transmission cp50x (eqsin)" was down 90% (dropped from 1.6 GBits to 0.12 Gbit/s). The bottom lasted about 30min (23:50 - 23:20).
Edit count (Grafana) |
---|
Edit count (global) dropped from 800/min to 350/min (down 56%) |
Varnish http (Grafana) |
---|
Requests (total) dropped from 11M/min to 8M/min (down 30%) |
Asia page views (Grafana) |
---|
Page views (1:100 samples) dropped from 170/min to <10/min (down 90%). This is based on client-side Geo and indicates that traffic was really down (as opposed to re-routed). |