On 14 April, a refactor of mediawiki-BagOStuff was deployed which introduced a bug that caused revision text blobs to no longer be cached in Memcached. Over a period of time, the amount of traffic sent to External Stores (wiki content databases) increased to the point of almost breakage of the database infrastructure:
https://grafana.wikimedia.org/d/000000278/mysql-aggregated?viewPanel=1&orgId=1&var-site=eqiad&var-group=core&var-shard=es1&var-shard=es2&var-shard=es3&var-shard=es4&var-shard=es5&var-role=All&from=1618370189989&to=1619816362678
https://grafana.wikimedia.org/d/000000278/mysql-aggregated?viewPanel=1&orgId=1&from=now-30d&to=now&var-site=eqiad&var-group=core&var-shard=All&var-role=All
More details on: https://wikitech.wikimedia.org/wiki/Incident_documentation/2021-04-29_db_and_memc_load and T281480
The issue could have been detected early if there was some kind of monitoring of Rate of change / week over week / prediction alarming for QPS.
Identify the best way to monitor this, in order to make meaningful alarms while preventing false positives (alert spam for non-impacting changes) and if there is a reasonable solution, implement it into production.