Today oct 12 between about 7:10-10:10 UTC there was a large amount of lag (>6 seconds) for commons, causing the database to intermittently go into read only mode. https://logstash.wikimedia.org/goto/2395ce2a93ebfab066887548f18ed62b and https://commons.wikimedia.org/wiki/Commons:Village_pump#Extended_read-only_of_Commons
Additionally between 15:00 - present (17:00) there seems to be a (much) smaller amount of lag related errors. (https://logstash.wikimedia.org/goto/3b982cffdf77978581c578ac50e818e8 ) This was actually for db1034 (s7) and db1070 (s5). I suppose maybe commons was doing either something wikidata or central auth related which triggered read only mode for the request to the foreign db. Edit: I just got confused here by logstash. db1034 had about 7.8 seconds of lag for a very short period of time (between 2017-10-13T01:42:38 and 2017-10-13T01:42:43) . The other s7 dbs were fine, so read only mode was not triggered. Similarly, db1070 intermittently had some bouts of lag, but this didn't cause any real problems as (I think) the other s5 replicas were fine.
- Perhaps the purging of commons rc needs to go slower? (T177772)
- Shouldn't this have triggered some sort of alerting, especially for the 7:10-10:10 period? There was no irc alerts at that time in the #wikimedia-operatins irc channel (I don't know how alerting for dbs are managed, so this comment may be silly)