In https://wikitech.wikimedia.org/wiki/Incident_documentation/20150103-Parsoid requests to a specific page seemed to be retried a large number of times. Since requests for this page locked up parsoid workers, this led to the parsoid cluster being quickly overloaded.
We should check why such a large number of retries are happening. Things to look into:
- Varnish backend retries on timeout (both frontend and backend)
- Parsoid job retries
Possibly related: T73853