As explained in T206740, we found out that at the time of a MediaWiki switchover to eqiad, the eqiad parsercaches contained only stale data.
This ticket is to assess what were the consequences on the whole cluster that were caused directly by this event.
Some of the issues observed:
- MCS (and restbase as a consequence) timing out on specific enpoints in the swagger checks
- Semi-sporadic 503 spikes
- A shower of memcached errors
- Very high CPU usage on the appservers
- Queued requests in HHVM
- Sporadic 500s with timeouts related to entire web request took longer than 60 seconds and timed out in /srv/mediawiki/php-1.32.0-wmf.24/includes/parser/Preprocessor_Hash.php
We need to analyze all of those and see what can be explained by the parsercache fiasco and what cannot.