Page MenuHomePhabricator

Wikimedia\Rdbms\LBFactory::getEmptyTransactionTicket: LinksUpdate does not have outer scope
Closed, ResolvedPublicPRODUCTION ERROR

Description

This, and its similar issues:

Wikimedia\Rdbms\LBFactory::getEmptyTransactionTicket: GlobalUsage::deleteLinksFromPage does not have outer scope.
Wikimedia\Rdbms\LBFactory::getEmptyTransactionTicket: GeoData\Hooks::doLinksUpdate does not have outer scope.
Wikimedia\Rdbms\LBFactory::getEmptyTransactionTicket: GlobalUsage::insertLinks does not have outer scope.

and maybe related

Wikimedia\Rdbms\LBFactory::getEmptyTransactionTicket: GlobalUsage::insertLinks does not have outer scope.
Wikimedia\Rdbms\LBFactory::getEmptyTransactionTicket: MediaWiki\Extension\PageAssessments\PageAssessmentsDAO::doUpdates does not have outer scope
Wikimedia\Rdbms\LBFactory::getEmptyTransactionTicket: WatchedItemStore::removeWatchBatchForUser does not have outer scope

Is failing between 2000 and 6000 times per hour (at edit rate)- [this is for the one on the title only, the others have similar rates]. Because GeoData gets rarely updated, and the same transaction error is on over on other extensions, this may be a rdbms mistake?

First issue on 2019-04-17T15:33:21, ramping up at 2019-04-17T19:12:50, and ramping up again at 2019-04-18T20:55:25.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
jcrespo triaged this task as Unbreak Now! priority.Apr 23 2019, 8:13 AM

Going up to unbreak now, because as far as I can see all edit hooks may be broken, causing long-lasting issues on the metadata. Change if that is not true.

This is generating around 80k errors per day.

https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/497537/ might fix that (by giving each update it's own transaction round)...but that patch still has some CI issues.

Change 497537 had a related patch set uploaded (by Aaron Schulz; owner: Aaron Schulz):
[mediawiki/core@master] Make DeferredUpdates enqueue jobs to finish failed updates

https://gerrit.wikimedia.org/r/497537

Is it worth back-porting that patch? Or is it too risky?

Is it worth back-porting that patch? Or is it too risky?

It's probably not worth it for third parties (unless they also have huge wikis where problems will happen if some of these lag wait sync points are skipped).

@aaron That commit introduces new behaviour we've never not before. That seems like an unlikely fix for this issue unless the cause is that nothing changed on the software side and that this is entirely a user-behaviour issue causing the issue to be more common than before.

Is that the case here? The specific error here about transactions not having "outer scope" does not appear to be a race condition like the other tasks the proposed commit addresses (which suffer from lock contention). Rather, it appears to be a deterministic problem due to fname strings not matching.

Have we ruled out our recent changes to LinksUpdate and Job execution logic as cause for this error?

It's the new use of DeferredUpdates::attemptUpdate/addCallableUpdate in that change instead of plain doUpdate() calls that gives them outer scope.

JTannerWMF added a subscriber: JTannerWMF.

Looks like Performance-Team is working on this so I am removing the corresponding Growth team tags.

It's the new use of DeferredUpdates::attemptUpdate/addCallableUpdate in that change instead of plain doUpdate() calls that gives them outer scope.

I see. The inline loop in DerivedPageDataUpdater for running updates wasn't mimicking that part of DeferredUpdates::runUpdate, which it needs to.

Could you extract that bug fix into its own commit, and reference what caused it? That'll also make it easier to test and verify in context.

akosiaris lowered the priority of this task from Unbreak Now! to High.May 22 2019, 12:59 PM
akosiaris added a subscriber: akosiaris.

I am lowering to High, just in the interest of not abusing Unbreak Now!, since this task has been in this state since Apr 23. That being said, this indeed needs to be resolved ASAP.

So, we have a potential fix, but it doesn't seem to work with wikibase? Does the wikidata team know?

I'm seeing this consistently when saving edits to test2.wikipedia.org Sandbox (when add or removing a transclusion):

type: mediawiki
channel: DBPerformance
level: ERROR

Wikimedia\Rdbms\LBFactory::commitAndWaitForReplication: LinksUpdate::incrTableUpdate does not have outer scope.

#0 /srv/mediawiki/php-1.34.0-wmf.6/includes/deferred/LinksUpdate.php(494): Wikimedia\Rdbms\LBFactory->commitAndWaitForReplication('LinksUpdate::in...', NULL, Array)
#1 /srv/mediawiki/php-1.34.0-wmf.6/includes/deferred/LinksUpdate.php(282): LinksUpdate->incrTableUpdate('templatelinks', 'tl', Array, Array)
#2 /srv/mediawiki/php-1.34.0-wmf.6/includes/deferred/LinksUpdate.php(189): LinksUpdate->doIncrementalUpdate()
#3 /srv/mediawiki/php-1.34.0-wmf.6/includes/Storage/DerivedPageDataUpdater.php(1622): LinksUpdate->doUpdate()
#4 /srv/mediawiki/php-1.34.0-wmf.6/includes/Storage/DerivedPageDataUpdater.php(1429): MediaWiki\Storage\DerivedPageDataUpdater->doSecondaryDataUpdates(Array)
#5 /srv/mediawiki/php-1.34.0-wmf.6/includes/deferred/MWCallableUpdate.php(34): MediaWiki\Storage\DerivedPageDataUpdater->MediaWiki\Storage\{closure}()
#6 /srv/mediawiki/php-1.34.0-wmf.6/includes/deferred/DeferredUpdates.php(274): MWCallableUpdate->doUpdate()
#7 /srv/mediawiki/php-1.34.0-wmf.6/includes/deferred/DeferredUpdates.php(219): DeferredUpdates::runUpdate(Object(MWCallableUpdate), Object(Wikimedia\Rdbms\LBFactoryMulti), 'run', 2)
#8 /srv/mediawiki/php-1.34.0-wmf.6/includes/deferred/DeferredUpdates.php(143): DeferredUpdates::execute(Array, 'run', 2)
#9 /srv/mediawiki/php-1.34.0-wmf.6/includes/MediaWiki.php(907): DeferredUpdates::doUpdates('run')
#10 /srv/mediawiki/php-1.34.0-wmf.6/includes/MediaWiki.php(731): MediaWiki->restInPeace('normal', false)
#11 /srv/mediawiki/php-1.34.0-wmf.6/includes/MediaWiki.php(754): MediaWiki->{closure}()
#12 /srv/mediawiki/php-1.34.0-wmf.6/includes/MediaWiki.php(548): MediaWiki->doPostOutputShutdown('normal')
#13 /srv/mediawiki/php-1.34.0-wmf.6/index.php(42): MediaWiki->run()

I don't know if it is the same issue or not, but core updates being broken definitely seems like a regression. If this is the same issue as reported in this task, I'd like to ask again whether we can identify what caused this regression recently and focus on separately from the "Retry failed DeferredUpdates" refactor, so that we can reduce risk from the refactor by allowing that to ride the train and beta phases normally. (In addition to currently being blocked due to Wikibase compat).

Krinkle renamed this task from Wikimedia\Rdbms\LBFactory::getEmptyTransactionTicket: GeoData\Hooks::doLinksUpdate does not have outer scope to Wikimedia\Rdbms\LBFactory::getEmptyTransactionTicket: LinksUpdate does not have outer scope.May 28 2019, 8:05 PM
Krinkle added subscribers: DannyS712, kchapman.

Change 513213 had a related patch set uploaded (by Aaron Schulz; owner: Aaron Schulz):
[mediawiki/core@master] Make sure that each DataUpdate still has outer transaction scope

https://gerrit.wikimedia.org/r/513213

Change 513526 had a related patch set uploaded (by Krinkle; owner: Aaron Schulz):
[mediawiki/core@wmf/1.34.0-wmf.7] Make sure that each DataUpdate still has outer transaction scope

https://gerrit.wikimedia.org/r/513526

Change 513213 merged by jenkins-bot:
[mediawiki/core@master] Make sure that each DataUpdate still has outer transaction scope

https://gerrit.wikimedia.org/r/513213

Change 513526 merged by jenkins-bot:
[mediawiki/core@wmf/1.34.0-wmf.7] Make sure that each DataUpdate still has outer transaction scope

https://gerrit.wikimedia.org/r/513526

Change 514249 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[mediawiki/core@wmf/1.34.0-wmf.7] Re-apply "Make sure that each DataUpdate still has outer transaction scope"

https://gerrit.wikimedia.org/r/514249

Change 514249 merged by jenkins-bot:
[mediawiki/core@wmf/1.34.0-wmf.7] Re-apply "Make sure that each DataUpdate still has outer transaction scope"

https://gerrit.wikimedia.org/r/514249

Mentioned in SAL (#wikimedia-operations) [2019-06-04T11:09:03Z] <krinkle@deploy1001> Synchronized php-1.34.0-wmf.6/includes/: T221577 / rMW1286d131c018 (duration: 01m 07s)

Just to be clear, this is on HEAD, but not yet deployed/live, correct? https://logstash.wikimedia.org/goto/9eb41e92236727f9095b28489dc244a6

No problem if I am right, just to make sure nothing is being missed.

Mentioned in SAL (#wikimedia-operations) [2019-06-04T11:39:58Z] <krinkle@deploy1001> Synchronized php-1.34.0-wmf.7/includes/: T221577 / rMW1286d131c018 (duration: 01m 04s)

Change 514273 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] mariadb: Depool labsdb1011 for maintenance

https://gerrit.wikimedia.org/r/514273

Change 514273 merged by Jcrespo:
[operations/puppet@production] mariadb: Depool labsdb1011 for maintenance

https://gerrit.wikimedia.org/r/514273

Higher level issue of re-trying LinksUpdate now tracked at T206283.

mmodell changed the subtype of this task from "Task" to "Production Error".Aug 28 2019, 11:07 PM