Error
Request ID: W6uv4ArAACkAAEOtbmAAAAAQ
A database query error has occurred. Did you forget to run your application's database schema updater after upgrading? Query: UPDATE `page` SET page_touched = '20180926161220' WHERE page_id IN ('71237625','72854789') AND (page_touched < '20180926161219') Function: HTMLCacheUpdateJob::invalidateTitles Error: 1205 Lock wait timeout exceeded; try restarting transaction (10.192.16.22)
#0 /srv/mediawiki/php-1.32.0-wmf.23/includes/libs/rdbms/database/Database.php(1428): Wikimedia\Rdbms\Database->makeQueryException(string, integer, string, string) #1 /srv/mediawiki/php-1.32.0-wmf.23/includes/libs/rdbms/database/Database.php(1198): Wikimedia\Rdbms\Database->reportQueryError(string, integer, string, string, boolean) #2 /srv/mediawiki/php-1.32.0-wmf.23/includes/libs/rdbms/database/Database.php(2074): Wikimedia\Rdbms\Database->query(string, string) #3 /srv/mediawiki/php-1.32.0-wmf.23/includes/jobqueue/jobs/HTMLCacheUpdateJob.php(148): Wikimedia\Rdbms\Database->update(string, array, array, string) #4 /srv/mediawiki/php-1.32.0-wmf.23/includes/jobqueue/jobs/HTMLCacheUpdateJob.php(97): HTMLCacheUpdateJob->invalidateTitles(array) #5 /srv/mediawiki/php-1.32.0-wmf.23/extensions/EventBus/includes/JobExecutor.php(64): HTMLCacheUpdateJob->run() #6 /srv/mediawiki/rpc/RunSingleJob.php(67): JobExecutor->execute(array)
Notes
The example in question is for two pages on commons.wikimedia.org that were edited/created together by the same user as part of a the "Nominate for deletion" feature.
At the same time as the above error, the were also a bunch of other jobs failing with the same page_in IN() condition, but with similar or slightly different touch-timestamps. Which would explain why they all dead locks because they're all waiting for each other in some way, I guess.
[W6uv9wrAAC0AACBKWiwAAABE] Query: UPDATE `page` SET page_touched = '20180926161305' WHERE page_id IN ('71237625','72854789') AND (page_touched < '20180926161244') [W6uv4QrAEHUAAB1wFb8AAAAR] Query: UPDATE `page` SET page_touched = '20180926161305' WHERE page_id IN ('71237625','72854789') AND (page_touched < '20180926161242') [W6uv@ArAIGYAAHvODEAAAABV Query: UPDATE `page` SET page_touched = '20180926161305' WHERE page_id IN ('71237625','72854789') AND (page_touched < '20180926161243') [W6uv8ArAIF4AAK8uKoUAAAAX] Query: UPDATE `page` SET page_touched = '20180926161304' WHERE page_id IN ('71237625','72854789') AND (page_touched < '20180926161240') [W6uv9ArAAC4AAJzZqBAAAAAK] Query: UPDATE `page` SET page_touched = '20180926161304' WHERE page_id IN ('71237625','72854789') AND (page_touched < '20180926161242') ... [W6uv5wrAACgAAEOlUzMAAACP] Query: UPDATE `page` SET page_touched = '20180926161228' WHERE page_id IN ('71237625','72854789') AND (page_touched < '20180926161226') [W6uv5QrAIFoAAKdmghAAAAAA] Query: UPDATE `page` SET page_touched = '20180926161227' WHERE page_id IN ('71237625','72854789') AND (page_touched < '20180926161226')
All together 31 jobs were executing for the same set of page IDs all failing with deadlocks.
Looking at the past 30 days in WMF Logstash, the deadlock has happened before a few times (starting at August 25, which is today's log cut off so unrelated). However it has only happened for 1 or 2 jobs at most, never 31. That might mean the problem has gotten worse, or maybe it is just a coincidence.
In any event, deadlocks are bad.