This task tracks the investigation of a partial outage of the MediaWiki API cluster, which started at 15:18 on 2016-10-17 and ended at XXX.
What happened:
- Requests piled up on individual app servers.
- App servers eventually became unresponsive and were depooled.
- The number of pooled API backends shrank until it could no longer meet demand
- Users were served 5xx errors.