Page MenuHomePhabricator

CI job wmf-quibble-selenium-php72-docker get a deadlock on `change_tag_def` (NameTableStore)
Closed, ResolvedPublic

Description

https://integration.wikimedia.org/ci/job/wmf-quibble-selenium-php72-docker/129980/console

21:42:49 Dumping file(s) mw-dberror.log
21:42:49 mw-error.log
21:42:49 + set +e
21:42:49 + cd /srv/jenkins/workspace/workspace/wmf-quibble-selenium-php72-docker/log
21:42:49 + grep --color . mw-dberror.log mw-error.log
21:42:49 mw-dberror.log:Tue Jan 11 21:41:50 UTC 2022	b65ce7f67cb6	wikidb	Error 1213 from MediaWiki\Storage\NameTableStore::store, Deadlock found when trying to get lock; try restarting transaction (localhost:/workspace/db/quibble-mysql-y3a6rjcn/socket) INSERT IGNORE INTO `change_tag_def` (ctd_name,ctd_user_defined,ctd_count) VALUES ('mobile web edit',0,0) localhost:/workspace/db/quibble-mysql-y3a6rjcn/socket
21:42:49 mw-dberror.log:#0 /workspace/src/includes/libs/rdbms/database/Database.php(1758): Wikimedia\Rdbms\Database->getQueryExceptionAndLog(string, integer, string, string)
21:42:49 mw-dberror.log:#1 /workspace/src/includes/libs/rdbms/database/Database.php(1302): Wikimedia\Rdbms\Database->reportQueryError(string, integer, string, string, boolean)
21:42:49 mw-dberror.log:#2 /workspace/src/includes/libs/rdbms/database/Database.php(2548): Wikimedia\Rdbms\Database->query(string, string, integer)
21:42:49 mw-dberror.log:#3 /workspace/src/includes/libs/rdbms/database/Database.php(2508): Wikimedia\Rdbms\Database->doInsertNonConflicting(string, array, string)
21:42:49 mw-dberror.log:#4 /workspace/src/includes/libs/rdbms/database/DBConnRef.php(68): Wikimedia\Rdbms\Database->insert(string, array, string, array)
21:42:49 mw-dberror.log:#5 /workspace/src/includes/libs/rdbms/database/DBConnRef.php(351): Wikimedia\Rdbms\DBConnRef->__call(string, array)
21:42:49 mw-dberror.log:#6 /workspace/src/includes/Storage/NameTableStore.php(411): Wikimedia\Rdbms\DBConnRef->insert(string, array, string, array)
21:42:49 mw-dberror.log:#7 /workspace/src/includes/libs/rdbms/database/Database.php(4841): MediaWiki\Storage\NameTableStore->MediaWiki\Storage\{closure}(Wikimedia\Rdbms\DatabaseMysqli, string)
21:42:49 mw-dberror.log:#8 /workspace/src/includes/libs/rdbms/database/DBConnRef.php(68): Wikimedia\Rdbms\Database->doAtomicSection(string, Closure, string)
21:42:49 mw-dberror.log:#9 /workspace/src/includes/libs/rdbms/database/DBConnRef.php(621): Wikimedia\Rdbms\DBConnRef->__call(string, array)
21:42:49 mw-dberror.log:#10 /workspace/src/includes/Storage/NameTableStore.php(432): Wikimedia\Rdbms\DBConnRef->doAtomicSection(string, Closure, string)
21:42:49 mw-dberror.log:#11 /workspace/src/includes/Storage/NameTableStore.php(168): MediaWiki\Storage\NameTableStore->store(string)
21:42:49 mw-dberror.log:#12 /workspace/src/includes/changetags/ChangeTags.php(464): MediaWiki\Storage\NameTableStore->acquireId(string)
21:42:49 mw-dberror.log:#13 /workspace/src/includes/changetags/ChangeTags.php(336): ChangeTags::updateTags(array, array, integer, integer, string, NULL, RecentChange)
21:42:49 mw-dberror.log:#14 /workspace/src/includes/changes/RecentChange.php(495): ChangeTags::addTags(array, integer, integer, string, NULL, RecentChange)
21:42:49 mw-dberror.log:#15 /workspace/src/extensions/Wikibase/client/includes/Changes/InjectRCRecordsJob.php(279): RecentChange->save()
21:42:49 mw-dberror.log:#16 /workspace/src/includes/jobqueue/JobRunner.php(373): Wikibase\Client\Changes\InjectRCRecordsJob->run()
21:42:49 mw-dberror.log:#17 /workspace/src/includes/jobqueue/JobRunner.php(334): JobRunner->doExecuteJob(Wikibase\Client\Changes\InjectRCRecordsJob)
21:42:49 mw-dberror.log:#18 /workspace/src/includes/jobqueue/JobRunner.php(247): JobRunner->executeJob(Wikibase\Client\Changes\InjectRCRecordsJob)
21:42:49 mw-dberror.log:#19 /workspace/src/includes/MediaWiki.php(1189): JobRunner->run(array)
21:42:49 mw-dberror.log:#20 /workspace/src/includes/MediaWiki.php(636): MediaWiki->triggerSyncJobs(integer)
21:42:49 mw-dberror.log:#21 /workspace/src/includes/deferred/TransactionRoundDefiningUpdate.php(26): MediaWiki->{closure}()
21:42:49 mw-dberror.log:#22 /workspace/src/includes/deferred/DeferredUpdates.php(507): TransactionRoundDefiningUpdate->doUpdate()
21:42:49 mw-dberror.log:#23 /workspace/src/includes/deferred/DeferredUpdates.php(392): DeferredUpdates::attemptUpdate(TransactionRoundDefiningUpdate, Wikimedia\Rdbms\LBFactorySimple)
21:42:49 mw-dberror.log:#24 /workspace/src/includes/deferred/DeferredUpdates.php(221): DeferredUpdates::run(TransactionRoundDefiningUpdate, Wikimedia\Rdbms\LBFactorySimple, MediaWiki\Logger\LegacyLogger, BufferingStatsdDataFactory, string)
21:42:49 mw-dberror.log:#25 /workspace/src/includes/deferred/DeferredUpdatesScope.php(267): DeferredUpdates::{closure}(TransactionRoundDefiningUpdate, integer)
21:42:49 mw-dberror.log:#26 /workspace/src/includes/deferred/DeferredUpdatesScope.php(196): DeferredUpdatesScope->processStageQueue(integer, integer, Closure)
21:42:49 mw-dberror.log:#27 /workspace/src/includes/deferred/DeferredUpdates.php(242): DeferredUpdatesScope->processUpdates(integer, Closure)
21:42:49 mw-dberror.log:#28 /workspace/src/includes/MediaWiki.php(1114): DeferredUpdates::doUpdates()
21:42:49 mw-dberror.log:#29 /workspace/src/includes/MediaWiki.php(839): MediaWiki->restInPeace()
21:42:49 mw-dberror.log:#30 /workspace/src/includes/MediaWiki.php(588): MediaWiki->doPostOutputShutdown()
21:42:49 mw-dberror.log:#31 /workspace/src/index.php(53): MediaWiki->run()
21:42:49 mw-dberror.log:#32 /workspace/src/index.php(46): wfIndexMain()
21:42:49 mw-dberror.log:#33 {main}
21:42:49 mw-dberror.log:Tue Jan 11 21:41:50 UTC 2022	b65ce7f67cb6	wikidb	Error 1305 from MediaWiki\Storage\NameTableStore::store, SAVEPOINT wikimedia_rdbms_atomic1 does not exist (localhost:/workspace/db/quibble-mysql-y3a6rjcn/socket) ROLLBACK TO SAVEPOINT `wikimedia_rdbms_atomic1` localhost:/workspace/db/quibble-mysql-y3a6rjcn/socket
21:42:49 mw-dberror.log:#0 /workspace/src/includes/libs/rdbms/database/Database.php(1758): Wikimedia\Rdbms\Database->getQueryExceptionAndLog(string, integer, string, string)
21:42:49 mw-dberror.log:#1 /workspace/src/includes/libs/rdbms/database/Database.php(1302): Wikimedia\Rdbms\Database->reportQueryError(string, integer, string, string, boolean)
21:42:49 mw-dberror.log:#2 /workspace/src/includes/libs/rdbms/database/Database.php(4596): Wikimedia\Rdbms\Database->query(string, string, integer)
21:42:49 mw-dberror.log:#3 /workspace/src/includes/libs/rdbms/database/Database.php(4806): Wikimedia\Rdbms\Database->doRollbackToSavepoint(string, string)
21:42:49 mw-dberror.log:#4 /workspace/src/includes/libs/rdbms/database/Database.php(4845): Wikimedia\Rdbms\Database->cancelAtomic(string, Wikimedia\Rdbms\AtomicSectionIdentifier)
21:42:49 mw-dberror.log:#5 /workspace/src/includes/libs/rdbms/database/DBConnRef.php(68): Wikimedia\Rdbms\Database->doAtomicSection(string, Closure, string)
21:42:49 mw-dberror.log:#6 /workspace/src/includes/libs/rdbms/database/DBConnRef.php(621): Wikimedia\Rdbms\DBConnRef->__call(string, array)
21:42:49 mw-dberror.log:#7 /workspace/src/includes/Storage/NameTableStore.php(432): Wikimedia\Rdbms\DBConnRef->doAtomicSection(string, Closure, string)
21:42:49 mw-dberror.log:#8 /workspace/src/includes/Storage/NameTableStore.php(168): MediaWiki\Storage\NameTableStore->store(string)

Maybe happen due to parallel? T226869

Possible related T298682

Event Timeline

tstarling added subscribers: Ladsgroup, tstarling.

I saw this at 896198. The test is flaky because the job is flaky.

InjectRCRecordsJob has a transaction wrapped around the whole job since 3aa96d61e24885c8a8d303692aac0c424b675681 at @Ladsgroup's request. Reverting that change would probably fix the issue. Deadlocks are expected if transactions are long and complex.

Context: https://gerrit.wikimedia.org/r/c/mediawiki/extensions/Wikibase/+/699194
And T281453

The work was done to inject and encapsulate db connections (given that wikibase needs to set $domain in getting LB and the connection and that has caused many issues, it was abstracted away) and was not intended to actually wrap the whole job into one massive transaction. One rather easy way to do it is to simply drop the transaction wrapper or cut it into smaller transactions. I leave it to engineers in WMDE to decide. cc. @Lucas_Werkmeister_WMDE and @Michael

I think this should be considered a wider team effort here, adding also @Jakob_WMDE @Tarrow here.

I’m pretty confused by the Gerrit discussion on that old change, but if I understand correctly, I guess we can just remove the transaction around the big loop in run()? Having beginAtomic() / endAtomic() around the entire contents of run() doesn’t seem very useful indeed.

Change 899615 had a related patch set uploaded (by Lucas Werkmeister (WMDE); author: Lucas Werkmeister (WMDE)):

[mediawiki/extensions/Wikibase@master] InjectRCRecordsJob: Remove transaction

https://gerrit.wikimedia.org/r/899615

Change 899615 merged by jenkins-bot:

[mediawiki/extensions/Wikibase@master] InjectRCRecordsJob: Remove transaction

https://gerrit.wikimedia.org/r/899615

Have merged a sub task with the same error message/stack trace, so the remove of the transaction is not enough

Hm, if this is still happening according to T332540 after the above change was merged, then that might not have been enough to fix the issue :(

Maybe the job could reduce its transaction size by doing lower number of titles? like reduce the batch size in tests. Just thinking out loud.

Change 930752 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[mediawiki/extensions/Wikibase@master] client: Reduce injecting RC records batch size in CI

https://gerrit.wikimedia.org/r/930752

This patch would reduce the size of titles being sent to a batch from 300 to 10 in CI: See WikiPageUpdater::injectRCRecords().

Change 930752 merged by jenkins-bot:

[mediawiki/extensions/Wikibase@master] client: Reduce injecting RC records batch size in CI

https://gerrit.wikimedia.org/r/930752

Ladsgroup claimed this task.

I close this, it's really hard to test that it really works. But it should at least drastically reduce the frequency.