Page MenuHomePhabricator

Failed deferred updates should be queued as jobs if possible (Deadlock from LinksUpdate in WikiPage::updateCategoryCounts)
Closed, ResolvedPublicPRODUCTION ERROR

Description

Error

Request ID: W7bPwArAIFkAACfFSSoAAAAF

message
A database query error has occurred.

Query: UPDATE  `category` SET cat_pages = cat_pages - 1,cat_files = cat_files - 1 WHERE cat_title = '###'
Function: WikiPage::updateCategoryCounts
Error: 1213 Deadlock found when trying to get lock; try restarting transaction
stacktrace
#1 /srv/mediawiki/php-1.32.0-wmf.24/includes/libs/rdbms/database/Database.php(1224): Wikimedia\Rdbms\Database->reportQueryError(string, integer, string, string, boolean)
#2 /srv/mediawiki/php-1.32.0-wmf.24/includes/libs/rdbms/database/Database.php(2100): Wikimedia\Rdbms\Database->query(string, string)
#3 /srv/mediawiki/php-1.32.0-wmf.24/includes/page/WikiPage.php(3483): Wikimedia\Rdbms\Database->update(string, array, array, string)
#4 /srv/mediawiki/php-1.32.0-wmf.24/includes/deferred/LinksUpdate.php(399): WikiPage->updateCategoryCounts(array, array, integer)
#5 /srv/mediawiki/php-1.32.0-wmf.24/includes/deferred/LinksUpdate.php(294): LinksUpdate->updateCategoryCounts(array, array)
#6 /srv/mediawiki/php-1.32.0-wmf.24/includes/deferred/LinksUpdate.php(175): LinksUpdate->doIncrementalUpdate()
#7 /srv/mediawiki/php-1.32.0-wmf.24/includes/deferred/DeferredUpdates.php(268): LinksUpdate->doUpdate()
#8 /srv/mediawiki/php-1.32.0-wmf.24/includes/deferred/DeferredUpdates.php(214): DeferredUpdates::runUpdate(LinksUpdate, Wikimedia\Rdbms\LBFactoryMulti, string, integer)
#9 /srv/mediawiki/php-1.32.0-wmf.24/includes/deferred/DeferredUpdates.php(134): DeferredUpdates::execute(array, string, integer)
#10 /srv/mediawiki/php-1.32.0-wmf.24/includes/MediaWiki.php(914): DeferredUpdates::doUpdates(string)
#11 /srv/mediawiki/php-1.32.0-wmf.24/includes/MediaWiki.php(734): MediaWiki->restInPeace(string, boolean)
#12 [internal function]: Closure$MediaWiki::doPostOutputShutdown()

Impact

Exception thrown during a deferred update, which then gets skipped and have no retry. This may lead to database inconsistencies due to the update not having been done. E.g. category queries returning incorrect or outdated results.

Notes

Reported in Logstash 71 times in the last 30 days, of which most were during requests that save an edit. In 14 cases it was from a RefreshLinks job on jobrunners instead.

In addition to the entry in the exception log, there are also 2x the number of errors in the DBQuery logs. The error gets logged exactly twice, every time (142 times). Not sure why that is.

See also:

Event Timeline

Krinkle triaged this task as Medium priority.Dec 19 2018, 11:43 PM

Querying +message:"WikiPage::updateCategoryCounts" in Logstash shows 35 reports in the last few days. Mainly enwiki and itwikisource, API requests and Jobs of type RefreshLinksJob.

Moving back to the pile for someone else or a later me to re-grab. Also relevant is https://gerrit.wikimedia.org/r/466981 (awaiting review), which might end up fixing this entirely.

Change 466981 had a related patch set uploaded (by Krinkle; owner: Aaron Schulz):
[mediawiki/core@master] Make DeferredUpdates enqueue updates that failed to run when possible

https://gerrit.wikimedia.org/r/466981

This comment was removed by Krinkle.

Change 466981 merged by jenkins-bot:
[mediawiki/core@master] Make DeferredUpdates enqueue updates that failed to run when possible

https://gerrit.wikimedia.org/r/466981

Change 466981 merged by jenkins-bot:
[mediawiki/core@master] Make DeferredUpdates enqueue updates that failed to run when possible

https://gerrit.wikimedia.org/r/466981

This was reverted in 6b7ddf9c9bf / https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/497324/.

Change 497537 had a related patch set uploaded (by Aaron Schulz; owner: Aaron Schulz):
[mediawiki/core@master] Make DeferredUpdates enqueue jobs to finish failed updates

https://gerrit.wikimedia.org/r/497537

Krinkle renamed this task from Deadlock exception from LinksUpdate in WikiPage::updateCategoryCounts to Failed deferred updates should be queued as jobs if possible (Deadlock from LinksUpdate in WikiPage::updateCategoryCounts).Aug 5 2019, 5:10 PM

Change 522233 had a related patch set uploaded (by Krinkle; owner: Aaron Schulz):
[mediawiki/core@master] Make LinksUpdate no longer extend EnqueueableDataUpdate

https://gerrit.wikimedia.org/r/522233

Change 522234 had a related patch set uploaded (by Krinkle; owner: Aaron Schulz):
[mediawiki/core@master] Clean up DeferredUpdates transactions and push failed updates as jobs

https://gerrit.wikimedia.org/r/522234

Change 522233 merged by jenkins-bot:
[mediawiki/core@master] Make LinksUpdate no longer extend EnqueueableDataUpdate

https://gerrit.wikimedia.org/r/522233

mmodell changed the subtype of this task from "Task" to "Production Error".Aug 28 2019, 11:08 PM

Change 538362 had a related patch set uploaded (by Aaron Schulz; owner: Aaron Schulz):
[mediawiki/core@master] Avoid using "enqueue" mode for deferred updates in doPostOutputShutdown()

https://gerrit.wikimedia.org/r/538362

Change 538362 merged by jenkins-bot:
[mediawiki/core@master] Avoid using "enqueue" mode for deferred updates in doPostOutputShutdown

https://gerrit.wikimedia.org/r/538362

These two are the main changes that would hopefully resolve the problem (pending CR and Wikibase testing):

Change 522234 had a related patch set uploaded (owner: Aaron Schulz):
[mediawiki/core@master] Let DeferredUpdates push failed updates as jobs

https://gerrit.wikimedia.org/r/522234

Change 497537 had a related patch set uploaded (owner: Aaron Schulz):
[mediawiki/core@master] Add RefreshSecondaryDataUpdate and use it in DerivedPageDataUpdater

https://gerrit.wikimedia.org/r/497537

daniel subscribed.

Looks like both linked tasks need review/merging. Bumping on clinic duty board. I'm out for ten days.

Change 522234 merged by jenkins-bot:
[mediawiki/core@master] Clean up DeferredUpdates transactions and push failed updates as jobs

https://gerrit.wikimedia.org/r/522234

@Krinkle @aaron Gerrit 522234 has been merged. As for Gerrit 497537, two things are missing: (i) Amir had a concern on the PS that went unaddressed; and (ii) it is not clear whether WB tests pass with that patch included or not.

@Krinkle @aaron Gerrit 522234 has been merged. As for Gerrit 497537, two things are missing:
(i) Amir had a concern on the PS that went unaddressed;

OK. I've signed the Gerrit change back over to @aaron.

and (ii) it is not clear whether WB tests pass with that patch included or not.

We've confirmed at https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/Wikibase/+/497443/ that WB tests pass with the commit included (dummy change on Wikibase with Depends-On to the core patch). Does that cover it, or are there other signals that suggested it might not pass?

@Krinkle @aaron Gerrit 522234 has been merged. As for Gerrit 497537, two things are missing:
(i) Amir had a concern on the PS that went unaddressed;

OK. I've signed the Gerrit change back over to @aaron.

Great , thank you!

and (ii) it is not clear whether WB tests pass with that patch included or not.

We've confirmed at https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/Wikibase/+/497443/ that WB tests pass with the commit included (dummy change on Wikibase with Depends-On to the core patch). Does that cover it, or are there other signals that suggested it might not pass?

Yup, seen that, I just wanted to double-check with you and Aaron that the tests there did indeed include this core patch, so we are good here then.

Change 497537 merged by jenkins-bot:
[mediawiki/core@master] Add RefreshSecondaryDataUpdate and use it in DerivedPageDataUpdater

https://gerrit.wikimedia.org/r/497537