Page MenuHomePhabricator

Failed deferred updates should be queued as jobs if possible (Deadlock from LinksUpdate in WikiPage::updateCategoryCounts)
Open, NormalPublic

Description

Error

Request ID: W7bPwArAIFkAACfFSSoAAAAF

message
A database query error has occurred.

Query: UPDATE  `category` SET cat_pages = cat_pages - 1,cat_files = cat_files - 1 WHERE cat_title = '###'
Function: WikiPage::updateCategoryCounts
Error: 1213 Deadlock found when trying to get lock; try restarting transaction
stacktrace
#1 /srv/mediawiki/php-1.32.0-wmf.24/includes/libs/rdbms/database/Database.php(1224): Wikimedia\Rdbms\Database->reportQueryError(string, integer, string, string, boolean)
#2 /srv/mediawiki/php-1.32.0-wmf.24/includes/libs/rdbms/database/Database.php(2100): Wikimedia\Rdbms\Database->query(string, string)
#3 /srv/mediawiki/php-1.32.0-wmf.24/includes/page/WikiPage.php(3483): Wikimedia\Rdbms\Database->update(string, array, array, string)
#4 /srv/mediawiki/php-1.32.0-wmf.24/includes/deferred/LinksUpdate.php(399): WikiPage->updateCategoryCounts(array, array, integer)
#5 /srv/mediawiki/php-1.32.0-wmf.24/includes/deferred/LinksUpdate.php(294): LinksUpdate->updateCategoryCounts(array, array)
#6 /srv/mediawiki/php-1.32.0-wmf.24/includes/deferred/LinksUpdate.php(175): LinksUpdate->doIncrementalUpdate()
#7 /srv/mediawiki/php-1.32.0-wmf.24/includes/deferred/DeferredUpdates.php(268): LinksUpdate->doUpdate()
#8 /srv/mediawiki/php-1.32.0-wmf.24/includes/deferred/DeferredUpdates.php(214): DeferredUpdates::runUpdate(LinksUpdate, Wikimedia\Rdbms\LBFactoryMulti, string, integer)
#9 /srv/mediawiki/php-1.32.0-wmf.24/includes/deferred/DeferredUpdates.php(134): DeferredUpdates::execute(array, string, integer)
#10 /srv/mediawiki/php-1.32.0-wmf.24/includes/MediaWiki.php(914): DeferredUpdates::doUpdates(string)
#11 /srv/mediawiki/php-1.32.0-wmf.24/includes/MediaWiki.php(734): MediaWiki->restInPeace(string, boolean)
#12 [internal function]: Closure$MediaWiki::doPostOutputShutdown()

Impact

Exception thrown during a deferred update, which then gets skipped and have no retry. This may lead to database inconsistencies due to the update not having been done. E.g. category queries returning incorrect or outdated results.

Notes

Reported in Logstash 71 times in the last 30 days, of which most were during requests that save an edit. In 14 cases it was from a RefreshLinks job on jobrunners instead.

In addition to the entry in the exception log, there are also 2x the number of errors in the DBQuery logs. The error gets logged exactly twice, every time (142 times). Not sure why that is.

See also:

Event Timeline

Krinkle created this task.Oct 5 2018, 3:16 AM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptOct 5 2018, 3:16 AM
Imarlier assigned this task to Krinkle.Dec 5 2018, 4:42 PM
Imarlier moved this task from Next In This Quarter to Doing on the Performance-Team board.
Krinkle triaged this task as Normal priority.Dec 19 2018, 11:43 PM

Querying +message:"WikiPage::updateCategoryCounts" in Logstash shows 35 reports in the last few days. Mainly enwiki and itwikisource, API requests and Jobs of type RefreshLinksJob.

Krinkle removed Krinkle as the assignee of this task.Jan 22 2019, 7:16 PM
Krinkle moved this task from Doing to Backlog: Small & Maintenance on the Performance-Team board.

Moving back to the pile for someone else or a later me to re-grab. Also relevant is https://gerrit.wikimedia.org/r/466981 (awaiting review), which might end up fixing this entirely.

Change 466981 had a related patch set uploaded (by Krinkle; owner: Aaron Schulz):
[mediawiki/core@master] Make DeferredUpdates enqueue updates that failed to run when possible

https://gerrit.wikimedia.org/r/466981

This comment was removed by Krinkle.

Change 466981 merged by jenkins-bot:
[mediawiki/core@master] Make DeferredUpdates enqueue updates that failed to run when possible

https://gerrit.wikimedia.org/r/466981

Change 466981 merged by jenkins-bot:
[mediawiki/core@master] Make DeferredUpdates enqueue updates that failed to run when possible
https://gerrit.wikimedia.org/r/466981

This was reverted in 6b7ddf9c9bf / https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/497324/.

Change 497537 had a related patch set uploaded (by Aaron Schulz; owner: Aaron Schulz):
[mediawiki/core@master] Make DeferredUpdates enqueue jobs to finish failed updates

https://gerrit.wikimedia.org/r/497537

Krinkle assigned this task to aaron.Apr 23 2019, 7:31 PM
Krinkle renamed this task from Deadlock exception from LinksUpdate in WikiPage::updateCategoryCounts to Failed deferred updates should be queued as jobs if possible (Deadlock from LinksUpdate in WikiPage::updateCategoryCounts).Aug 5 2019, 5:10 PM

Change 522233 had a related patch set uploaded (by Krinkle; owner: Aaron Schulz):
[mediawiki/core@master] Make LinksUpdate no longer extend EnqueueableDataUpdate

https://gerrit.wikimedia.org/r/522233

Change 522234 had a related patch set uploaded (by Krinkle; owner: Aaron Schulz):
[mediawiki/core@master] Clean up DeferredUpdates transactions and push failed updates as jobs

https://gerrit.wikimedia.org/r/522234

Change 522233 merged by jenkins-bot:
[mediawiki/core@master] Make LinksUpdate no longer extend EnqueueableDataUpdate

https://gerrit.wikimedia.org/r/522233

mmodell changed the subtype of this task from "Task" to "Production Error".Aug 28 2019, 11:08 PM

Change 538362 had a related patch set uploaded (by Aaron Schulz; owner: Aaron Schulz):
[mediawiki/core@master] Avoid using "enqueue" mode for deferred updates in doPostOutputShutdown()

https://gerrit.wikimedia.org/r/538362

Change 538362 merged by jenkins-bot:
[mediawiki/core@master] Avoid using "enqueue" mode for deferred updates in doPostOutputShutdown

https://gerrit.wikimedia.org/r/538362

These two are the main changes that would hopefully resolve the problem (pending CR and Wikibase testing):

Change 522234 had a related patch set uploaded (owner: Aaron Schulz):
[mediawiki/core@master] Let DeferredUpdates push failed updates as jobs
https://gerrit.wikimedia.org/r/522234

Change 497537 had a related patch set uploaded (owner: Aaron Schulz):
[mediawiki/core@master] Add RefreshSecondaryDataUpdate and use it in DerivedPageDataUpdater
https://gerrit.wikimedia.org/r/497537

daniel added a subscriber: daniel.

Looks like both linked tasks need review/merging. Bumping on clinic duty board. I'm out for ten days.

Change 522234 merged by jenkins-bot:
[mediawiki/core@master] Clean up DeferredUpdates transactions and push failed updates as jobs

https://gerrit.wikimedia.org/r/522234

@Krinkle @aaron Gerrit 522234 has been merged. As for Gerrit 497537, two things are missing: (i) Amir had a concern on the PS that went unaddressed; and (ii) it is not clear whether WB tests pass with that patch included or not.