Page MenuHomePhabricator

Wikimedia\Rdbms\LBFactory::getEmptyTransactionTicket: GeoData\Hooks::doLinksUpdate does not have outer scope
Open, Unbreak Now!Public

Description

This, and its similar issues:

Wikimedia\Rdbms\LBFactory::getEmptyTransactionTicket: GlobalUsage::deleteLinksFromPage does not have outer scope.
Wikimedia\Rdbms\LBFactory::getEmptyTransactionTicket: GeoData\Hooks::doLinksUpdate does not have outer scope.
Wikimedia\Rdbms\LBFactory::getEmptyTransactionTicket: GlobalUsage::insertLinks does not have outer scope.

and maybe related

Wikimedia\Rdbms\LBFactory::getEmptyTransactionTicket: GlobalUsage::insertLinks does not have outer scope.
Wikimedia\Rdbms\LBFactory::getEmptyTransactionTicket: MediaWiki\Extension\PageAssessments\PageAssessmentsDAO::doUpdates does not have outer scope
Wikimedia\Rdbms\LBFactory::getEmptyTransactionTicket: WatchedItemStore::removeWatchBatchForUser does not have outer scope

Is failing between 2000 and 6000 times per hour (at edit rate)- [this is for the one on the title only, the others have similar rates]. Because GeoData gets rarely updated, and the same transaction error is on over on other extensions, this may be a rdbms mistake?

First issue on 2019-04-17T15:33:21, ramping up at 2019-04-17T19:12:50, and ramping up again at 2019-04-18T20:55:25.

Event Timeline

jcrespo created this task.Tue, Apr 23, 7:38 AM
Restricted Application added a project: Discovery-Search. · View Herald TranscriptTue, Apr 23, 7:38 AM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Restricted Application added projects: Multimedia, Community-Tech. · View Herald TranscriptTue, Apr 23, 7:48 AM
jcrespo updated the task description. (Show Details)
Restricted Application added a project: Growth-Team. · View Herald TranscriptTue, Apr 23, 7:50 AM
jcrespo updated the task description. (Show Details)Tue, Apr 23, 8:06 AM
jcrespo triaged this task as Unbreak Now! priority.Tue, Apr 23, 8:13 AM

Going up to unbreak now, because as far as I can see all edit hooks may be broken, causing long-lasting issues on the metadata. Change if that is not true.

Restricted Application added subscribers: Liuxinyu970226, TerraCodes. · View Herald TranscriptTue, Apr 23, 8:13 AM
jcrespo updated the task description. (Show Details)Tue, Apr 23, 8:14 AM
MaxSem added a subscriber: aaron.Tue, Apr 23, 8:33 AM
Ramsey-WMF moved this task from Untriaged to Tracking on the Multimedia board.Tue, Apr 23, 5:22 PM

This is generating around 80k errors per day.

Krinkle moved this task from Backlog to libs/rdbms on the MediaWiki-Database board.
kchapman assigned this task to aaron.Mon, Apr 29, 7:58 PM
kchapman moved this task from Inbox to Doing on the Performance-Team board.
aaron added a comment.Tue, Apr 30, 4:46 AM

https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/497537/ might fix that (by giving each update it's own transaction round)...but that patch still has some CI issues.

Change 497537 had a related patch set uploaded (by Aaron Schulz; owner: Aaron Schulz):
[mediawiki/core@master] Make DeferredUpdates enqueue jobs to finish failed updates

https://gerrit.wikimedia.org/r/497537

Is it worth back-porting that patch? Or is it too risky?

aaron added a comment.Fri, May 3, 7:22 PM

Is it worth back-porting that patch? Or is it too risky?

It's probably not worth it for third parties (unless they also have huge wikis where problems will happen if some of these lag wait sync points are skipped).

Krinkle added a subscriber: Krinkle.Sat, May 4, 4:24 PM

@aaron That commit introduces new behaviour we've never not before. That seems like an unlikely fix for this issue unless the cause is that nothing changed on the software side and that this is entirely a user-behaviour issue causing the issue to be more common than before.

Is that the case here? The specific error here about transactions not having "outer scope" does not appear to be a race condition like the other tasks the proposed commit addresses (which suffer from lock contention). Rather, it appears to be a deterministic problem due to fname strings not matching.

Have we ruled out our recent changes to LinksUpdate and Job execution logic as cause for this error?

aaron added a comment.EditedTue, May 7, 4:29 AM

It's the new use of DeferredUpdates::attemptUpdate/addCallableUpdate in that change instead of plain doUpdate() calls that gives them outer scope.

JTannerWMF added a subscriber: JTannerWMF.

Looks like Performance-Team is working on this so I am removing the corresponding Growth team tags.

It's the new use of DeferredUpdates::attemptUpdate/addCallableUpdate in that change instead of plain doUpdate() calls that gives them outer scope.

I see. The inline loop in DerivedPageDataUpdater for running updates wasn't mimicking that part of DeferredUpdates::runUpdate, which it needs to.

Could you extract that bug fix into its own commit, and reference what caused it? That'll also make it easier to test and verify in context.