Page MenuHomePhabricator

WikibaseLexeme 'jenkins_u0_mw.unittest_content_models' doesn't exist
Closed, ResolvedPublic

Description

This is currently blocking WikibaseLexeme CI and thus all other CI that loads this extension of Wikibase etc.

Failure can be seen in WikibaseLexeme changes.

1) Wikibase\Lexeme\Tests\MediaWiki\Specials\LexemeSpecialEntityDataTest::testSensesKeyExistsInJsonWhenEnabled
16:20:24 Wikimedia\Rdbms\DBQueryError: A database query error has occurred. Did you forget to run your application's database schema updater after upgrading? 
16:20:24 Query: SELECT  model_id AS `id`,model_name AS `name`  FROM `unittest_content_models`     ORDER BY id 
16:20:24 Function: MediaWiki\Storage\NameTableStore::loadTable
16:20:24 Error: 1146 Table 'jenkins_u0_mw.unittest_content_models' doesn't exist (127.0.0.1:3306)
16:20:24 2) Wikibase\Lexeme\Tests\MediaWiki\Specials\LexemeSpecialEntityDataTest::testSensesKeyDoesntExistInJsonWhenDisabled
16:20:24 Wikimedia\Rdbms\DBQueryError: A database query error has occurred. Did you forget to run your application's database schema updater after upgrading? 
16:20:24 Query: SELECT  model_id AS `id`,model_name AS `name`  FROM `unittest_content_models`     ORDER BY id 
16:20:24 Function: MediaWiki\Storage\NameTableStore::loadTable
16:20:24 Error: 1146 Table 'jenkins_u0_mw.unittest_content_models' doesn't exist (127.0.0.1:3306)

https://integration.wikimedia.org/ci/job/mwext-testextension-hhvm-composer-jessie/15531/console

Event Timeline

Tarrow created this task.Aug 3 2018, 7:54 AM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptAug 3 2018, 7:55 AM

Current plan is replicating tests in quibble but using non-temp tables.

Addshore claimed this task.Aug 3 2018, 8:17 AM
Restricted Application added a project: User-Addshore. · View Herald TranscriptAug 3 2018, 8:18 AM
Addshore added a comment.EditedAug 3 2018, 8:37 AM

I tracked this down to:

$ git bisect bad
1e079652e078715160013e11df3f8d85ecc83d26 is the first bad commit
commit 1e079652e078715160013e11df3f8d85ecc83d26
Author: Aryeh Gregor <ayg@aryeh.name>
Date:   Wed Jul 25 17:37:16 2018 +0300

    Introduce ContentLanguage service to replace $wgContLang

    Bug: T200246
    Depends-On: I31c2e20fc70ba3cbc124b9f462f4924a139dd9bd
    Depends-On: I4aaf1c641ec6abef214eb96c0e4b42a67488ac00
    Depends-On: I461cf2f441a4040bb15d6c4bb93ce6114c143845
    Depends-On: I4b1cc4257348d1773fd2ccf045966261f801e7d0
    Depends-On: I9790b7efdd484366dc36eb8880778aea1a559e5e
    Change-Id: I193f5b9a95430b0a05573c361715e053e5411e32

:040000 040000 41f756809ae477d91d898f81564378e40fbc469f f98c46bd4367db45b8eb6621ace93843dd2d7eea M  includes
:040000 040000 fcce0c98b978c37e272f822bbbda05f3e0c66d0c 82baf89d193b08bd61fb7aa3adec8338e21104b0 M  tests

Tagging T200246

Addshore triaged this task as Unbreak Now! priority.Aug 3 2018, 8:45 AM
Addshore moved this task from Unsorted 💣 to Back Burner 🏛️ on the User-Addshore board.
Restricted Application added subscribers: Liuxinyu970226, TerraCodes. · View Herald TranscriptAug 3 2018, 8:45 AM
Addshore updated the task description. (Show Details)Aug 3 2018, 8:46 AM
Addshore added a subscriber: Legoktm.

Change 450200 had a related patch set uploaded (by Addshore; owner: Addshore):
[mediawiki/core@master] DNM Example of ContentLanguage patch breaking stuff

https://gerrit.wikimedia.org/r/450200

I have narrowed down the failure to running these 2 tests together:

mw-docker-dev phpunit default --filter '(LexemeDiffVisualizerIntegrationTest|LexemeSpecialEntityDataTest)' //var/www/mediawiki/extensions/WikibaseLexeme/tests/phpunit/mediawiki

With https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/WikibaseLexeme/+/450054/ checked out (which fixes another issue that recently happened) the table issue still occurs

Addshore added a comment.EditedAug 3 2018, 10:57 AM

So as suspected this is down to a second db connections being opened during the tests that doesn't have access to the temporary tables created on the first db connection.

I tracked this down to 'localAutoCommit' being used for the connection type with the following stack trace:

The check in openConnection checks if a connection matching the connection key and index exists, when using a different connection key (of which there are a few) a connection will not be found at the index so a new connection is created.

The only occurrence of localAutoCommit I can find is in LoadBalancer itself.

	const KEY_LOCAL_NOROUND = 'localAutoCommit';

KEY_LOCAL_NOROUND also only occurs in the LoadBalancer class

it is set with the following condition in openConnection

			$connKey = $autoCommit ? self::KEY_LOCAL_NOROUND : self::KEY_LOCAL;

autoCommit is true for the stack provided, and autoCommit is determind by the flags passed:

		$autoCommit = ( ( $flags & self::CONN_TRX_AUTOCOMMIT ) == self::CONN_TRX_AUTOCOMMIT );

flags are 1 in the stacktrace

Looking through the stack to find where this is set NameTableStore has on line 159:

$this->getDBConnection( DB_MASTER, LoadBalancer::CONN_TRX_AUTOCOMMIT )

which was introduced in https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/424188/ by @aaron

We need some extra handling so that which tests are running with temporary tables only a single connection to the DBs is used even for other connection types that are possible.

Addshore added a subscriber: aaron.Aug 3 2018, 10:57 AM
Addshore added a comment.EditedAug 3 2018, 12:17 PM

The other thing to note is that AUTOCOMMIT is only triggered when a race happens in the code:

				// RACE: $name was already in the db, probably just inserted, so load from master
				// Use DBO_TRX to avoid missing inserts due to other threads or REPEATABLE-READs
				$table = $this->loadTable(
					$this->getDBConnection( DB_MASTER, LoadBalancer::CONN_TRX_AUTOCOMMIT )
				);

Not sure why this race is happening. I'm guessing it is due to the swapping in and out of different NameTableStore instances while overriding other services...

For the tests that are failing and linked above the race condition happens when trying to store the content model "wikibase-lexeme". This can be found with a breakpoint on the race condition point in NameTableStore.
The in process cached version of the table only includes the 'wikitext' content type, however when trying to insert "wikibase-lexeme" it is apparently already there triggering the race. The race and AUTOCOMMIT then triggers the error that we actually see in this ticket.

This might warrant a little more digging into as I still smell a fish.

Tagging Multi-Content-Revisions as NameTableStore was introduced as part of MCR

Note: This is blocking Content Translation CI now. CX was blocked for a week by T200693 and now this one too.

Change 450512 had a related patch set uploaded (by WMDE-leszek; owner: WMDE-leszek):
[mediawiki/extensions/WikibaseLexeme@master] Temporarily skip LexemeSpecialEntityDataTest

https://gerrit.wikimedia.org/r/450512

Change 450524 had a related patch set uploaded (by Addshore; owner: Addshore):
[mediawiki/core@master] MediaWikiTestCase, whil using temp tables assert only 1 conn is used

https://gerrit.wikimedia.org/r/450524

Change 450531 had a related patch set uploaded (by Addshore; owner: Addshore):
[mediawiki/extensions/WikibaseLexeme@master] LexemeDiffVisualizerIntegrationTest mark tables used

https://gerrit.wikimedia.org/r/450531

Change 450531 merged by jenkins-bot:
[mediawiki/extensions/WikibaseLexeme@master] Mark tables used in multiple tests

https://gerrit.wikimedia.org/r/450531

Change 450512 abandoned by WMDE-leszek:
Temporarily skip LexemeSpecialEntityDataTest

https://gerrit.wikimedia.org/r/450512

Addshore lowered the priority of this task from Unbreak Now! to High.Aug 6 2018, 4:08 PM

This currently appears to be fixed within WikibaseLexeme

Addshore closed this task as Resolved.Aug 17 2018, 7:16 AM

I'm going to close this ticket and file a follow up for the MediaWikiTestCase temporary table & LoadBalancer connection category issue

Change 450200 abandoned by Addshore:
DNM Example of ContentLanguage patch breaking stuff

https://gerrit.wikimedia.org/r/450200

Change 450524 abandoned by Addshore:
MediaWikiTestCase, whil using temp tables assert only 1 conn is used

https://gerrit.wikimedia.org/r/450524