During the time that 1.31.0-wmf.27 was deployed to group1, implicit temporary tables shot up to ~4x the baseline, as can be seen in [[ https://grafana-admin.wikimedia.org/dashboard/db/mysql?orgId=1&var-dc=eqiad%20prometheus%2Fops&var-server=db1109&var-port=9104&from=1522263715035&to=1522265489680 | this grafana view ]]
The error is coming from [[ https://github.com/wikimedia/mediawiki/blob/master/includes/jobqueue/jobs/RefreshLinksJob.php#L258 | line 258 of RefreshLinksJob.php ]] where the code calls commitAndWaitForReplication:
`$lbFactory->commitAndWaitForReplication( __METHOD__, $ticket );`
== Timeline ==
|19:24| twentyafterfour@tin:| Synchronized php: group1 wikis to 1.31.0-wmf.26 (duration: 01m 17s)
|19:22| twentyafterfour@tin:| rebuilt and synchronized wikiversions files: group1 wikis to 1.31.0-wmf.26
|19:20| twentyafterfour: |Rolling back to wmf.26 due to increase in fatals: "Replication wait failed: lost connection to MySQL server during query"
|19:19| twentyafterfour: | rolling back to wmf.26
|19:18|icinga-wm| PROBLEM - MediaWiki exceptions and fatals per minute on graphite1001 is CRITICAL: CRITICAL: 80.00% of data above the critical threshold [50.0]
|||https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=2&fullscreen
|19:17| twentyafterfour: | I'm seeing quite a few "[{exception_id}] {exception_url} Wikimedia\Rdbms\DBExpectedError: Replication wait failed: Lost connection to MySQL server during query
|19:12| milimetric@tin:| Finished deploy [analytics/refinery@c22fd1e]: Fixing python import bug (duration: 02m 48s)
|19:09| milimetric@tin:| Started deploy [analytics/refinery@c22fd1e]: Fixing python import bug
|19:09| milimetric@tin:| Started deploy [analytics/refinery@c22fd1e]: (no justification provided)
|19:06| twentyafterfour@tin: |Synchronized php: group1 wikis to 1.31.0-wmf.27 (duration: 01m 17s)
|19:05| twentyafterfour@tin:| rebuilt and synchronized wikiversions files: group1 wikis to 1.31.0-wmf.27