Page MenuHomePhabricator

WikibaseLexeme CI broken (database errors)
Closed, ResolvedPublic

Description

For the past few hours, CI on Wikibase-related repositories has been mostly broken (both test builds and gate-and-submit). Some example changes:

Some example consoles:

They all seem to have similar error outputs: first some NameTableAccessExceptions…

Expected unused ID from database insert for 'Scribunto' into 'content_models', but ID 2 is already associated with the name 'wikibase-item'! This may indicate database corruption!

… and then some query errors:

Wikimedia\Rdbms\DBTransactionStateError: Cannot execute query from MessageCache::loadFromDB(en)-big while transaction status is ERROR.

Caused by
Wikimedia\Rdbms\DBUnexpectedError: Uncancelable atomic section canceled (got MediaWiki\Storage\RevisionStore::insertRevisionOn).

Warning: Destructor threw an object exception: exception 'Wikimedia\Rdbms\DBUnexpectedError' with message 'Uncancelable atomic section canceled (got MediaWiki\Storage\RevisionStore::insertRevisionOn).' in /workspace/src/includes/libs/rdbms/database/Database.php:3735

Related Objects

Event Timeline

Looking at core merges recently it must be one of these 2 patches:

Introduce RevisionRecord::isReadForInsertion https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/459620/
[MCR] Set MCR migration stage to write-both/read-new. … https://gerrit.wikimedia.org/r/#/c/443831/

Perhaps core should be running against WikibaseLexeme? at least on gate? I thought it should have been...

Addshore raised the priority of this task from High to Unbreak Now!.Sep 11 2018, 3:46 PM

The NameTableAccessException feels like it could be happening because the content_models table is purged between two tests, but the NameTableStore’s cache of it isn’t purged. I had a similar problem in WikibaseQualityConstraints (fixed in I8cdebc0eef).

The NameTableAccessException feels like it could be happening because the content_models table is purged between two tests, but the NameTableStore’s cache of it isn’t purged. I had a similar problem in WikibaseQualityConstraints (fixed in I8cdebc0eef).

Yup, @daniel confirmed that is likely what is happening.

Change 459812 had a related patch set uploaded (by Lucas Werkmeister (WMDE); owner: Lucas Werkmeister (WMDE)):
[mediawiki/core@master] [DNM] Set MCR migration stage to write-both/read-old

https://gerrit.wikimedia.org/r/459812

Change 459813 had a related patch set uploaded (by Lucas Werkmeister (WMDE); owner: Lucas Werkmeister (WMDE)):
[mediawiki/extensions/WikibaseQualityConstraints@master] [DNM] empty change to test CI

https://gerrit.wikimedia.org/r/459813

Change 459813 abandoned by Lucas Werkmeister (WMDE):
[DNM] empty change to test CI

Reason:
CI passed.

https://gerrit.wikimedia.org/r/459813

Investigation is still going on, but it looks like reverting the MCR migration stage would be a simple way to temporarily unbreak this, if desired. See I5426b5efd0 (but change the commit message before merging that, of course ☺)

Change 459820 had a related patch set uploaded (by Daniel Kinzler; owner: Daniel Kinzler):
[mediawiki/core@master] Reset the NameTableStoreFactory service in ParserTestRunner

https://gerrit.wikimedia.org/r/459820

Change 459821 had a related patch set uploaded (by Addshore; owner: Addshore):
[mediawiki/extensions/WikibaseLexeme@master] Add WikibaseLexemeExtensionRegistrationTest to group Database

https://gerrit.wikimedia.org/r/459821

Change 459823 had a related patch set uploaded (by Addshore; owner: Addshore):
[mediawiki/core@master] Only setup DB in tests that say they need the DB

https://gerrit.wikimedia.org/r/459823

Change 459823 abandoned by Addshore:
Only setup DB in tests that say they need the DB

Reason:
abandoning as there is an identical path

https://gerrit.wikimedia.org/r/459823

Change 459812 merged by jenkins-bot:
[mediawiki/core@master] Revert MCR migration stage to write-both/read-old

https://gerrit.wikimedia.org/r/459812

greg lowered the priority of this task from Unbreak Now! to High.Sep 11 2018, 6:32 PM
greg subscribed.

So after those reverts it's no longer broken, correct? Resetting priority based on that assumption. I'll leave you all to determine follow-ups/next steps.

So after those reverts it's no longer broken, correct? Resetting priority based on that assumption. I'll leave you all to determine follow-ups/next steps.

The "fix" has derailed MCR/SDC work. Not sure it's not still UBN…

Sorry, I couldn't tell from this task what the issue is; is it CI? If so in what way? Is it the change/how things are being done? If so how? :) Just trying to make informed decisions :)

Switching MCR in master to read-new (T198561) broke a bunch of the Wikibase-related extensions' unit tests (and thus they couldn't merge anything, which is sad). Now we're back to read-old in master which has unbroken Wikibase and means the MCR team will need to fix Wikibase and restore before they can proceed.

So, the CI is now fixed.
As a follow up it could be a good idea to add WikibaseLexeme to the list of gated extensions for core @greg @hashar ?

Change 459991 had a related patch set uploaded (by Addshore; owner: Addshore):
[integration/config@master] Add more Wikibase extensions to gatedextensions

https://gerrit.wikimedia.org/r/459991

Change 460017 had a related patch set uploaded (by Daniel Kinzler; owner: Daniel Kinzler):
[mediawiki/extensions/WikibaseLexeme@master] DNM: check that Lexeme CI passes in MCR read-new mode.

https://gerrit.wikimedia.org/r/460017

Jdforrester-WMF renamed this task from Wikibase CI broken (database errors) to WikibaseLexeme CI broken (database errors).Sep 12 2018, 3:24 PM

Change 459820 merged by jenkins-bot:
[mediawiki/core@master] Reset services in ParserTestTopLevelSuite.

https://gerrit.wikimedia.org/r/459820

Change 459821 merged by jenkins-bot:
[mediawiki/extensions/WikibaseLexeme@master] Add WikibaseLexemeExtensionRegistrationTest to group Database

https://gerrit.wikimedia.org/r/459821

So, the CI is now fixed.
As a follow up it could be a good idea to add WikibaseLexeme to the list of gated extensions for core @greg @hashar ?

Filled as T204153

Change 460017 abandoned by Jforrester:
DNM: check that Lexeme CI passes in MCR read-new mode.

Reason:
Testing over, this has landed.

https://gerrit.wikimedia.org/r/460017