The setup of a particular banner for all wikis/all users with 100% sampling caused traffic layer instabilities due to the large amount of traffic received- causing wikis to be unreachable in the impacted region (initially esams datacenter users - mostly Europe, Africa and middle East), with some temporary issues on other datacenters (eqiad). The reasons for the issues was connections piled up all the way up to the applayer (eventgate-analytics-external), and varnish wasn't able to handle it.
- Public doc: https://wikitech.wikimedia.org/wiki/Incidents/2022-03-04_esams_availability_banner_sampling
- Internal doc: https://docs.google.com/document/d/1xYYzFlJcAP9pckqBWyiXUbs7HN5iThg85lkjv_RUh_o/edit
This ticket has been created to track the followups and make sure the post mortem is documented on wikitech in the usual places and the incident is scored appropriately.