Page MenuHomePhabricator

SearchEngineTest fails during PHPUnit coverage job "no such table: unittest_page"
Closed, ResolvedPublic

Description

Both the php5 and php7 jobs for MediaWiki core's code coverage (PHPUnit), are failing with the following error since 6 April at 3 AM. It last passed 5 April 3 PM.

https://integration.wikimedia.org/ci/job/mediawiki-core-code-coverage-php7/197/console

............[ad7cb47bb9bae5ed9b5a55cc] [no req]   Wikimedia\Rdbms\DBQueryError from line 1439 of /srv/jenkins-workspace/workspace/mediawiki-core-code-coverage-php7/src/includes/libs/rdbms/database/Database.php: A database query error has occurred. Did you forget to run your application's database schema updater after upgrading? 
Query: SELECT  page_id,page_namespace,page_title,page_restrictions,page_is_redirect,page_is_new,page_random,page_touched,page_links_updated,page_latest,page_len,page_content_model  FROM unittest_page    WHERE page_namespace = '1' AND page_title = 'Not_Main_Page'  LIMIT 1  
Function: WikiPage::pageData
Error: 1 no such table: unittest_page

Backtrace:
#0 /srv/jenkins-workspace/workspace/mediawiki-core-code-coverage-php7/src/includes/libs/rdbms/database/Database.php(1409): Wikimedia\Rdbms\Database->makeQueryException(string, integer, string, string)
#1 /srv/jenkins-workspace/workspace/mediawiki-core-code-coverage-php7/src/includes/libs/rdbms/database/Database.php(1182): Wikimedia\Rdbms\Database->reportQueryError(string, integer, string, string, boolean)
#2 /srv/jenkins-workspace/workspace/mediawiki-core-code-coverage-php7/src/includes/libs/rdbms/database/Database.php(1635): Wikimedia\Rdbms\Database->query(string, string)
#3 /srv/jenkins-workspace/workspace/mediawiki-core-code-coverage-php7/src/includes/libs/rdbms/database/Database.php(1712): Wikimedia\Rdbms\Database->select(array, array, array, string, array, array)
#4 /srv/jenkins-workspace/workspace/mediawiki-core-code-coverage-php7/src/includes/page/WikiPage.php(380): Wikimedia\Rdbms\Database->selectRow(array, array, array, string, array, array)
#5 /srv/jenkins-workspace/workspace/mediawiki-core-code-coverage-php7/src/includes/page/WikiPage.php(400): WikiPage->pageData(Wikimedia\Rdbms\DatabaseSqlite, array, array)
#6 /srv/jenkins-workspace/workspace/mediawiki-core-code-coverage-php7/src/includes/page/WikiPage.php(438): WikiPage->pageDataFromTitle(Wikimedia\Rdbms\DatabaseSqlite, Title, array)
#7 /srv/jenkins-workspace/workspace/mediawiki-core-code-coverage-php7/src/includes/page/WikiPage.php(1605): WikiPage->loadPageData(integer)
#8 /srv/jenkins-workspace/workspace/mediawiki-core-code-coverage-php7/src/tests/phpunit/MediaWikiTestCase.php(1018): WikiPage->doEditContent(WikitextContent, string, integer, boolean, User)
#9 /srv/jenkins-workspace/workspace/mediawiki-core-code-coverage-php7/src/tests/phpunit/includes/search/SearchEngineTest.php(60): MediaWikiTestCase->insertPage(string, string)
#10 /srv/jenkins-workspace/workspace/mediawiki-core-code-coverage-php7/src/tests/phpunit/MediaWikiTestCase.php(414): SearchEngineTest->addDBDataOnce()
#11 /srv/jenkins-workspace/workspace/mediawiki-core-code-coverage-php7/src/vendor/phpunit/phpunit/src/Framework/TestSuite.php(722): MediaWikiTestCase->run(PHPUnit_Framework_TestResult)
#12 /srv/jenkins-workspace/workspace/mediawiki-core-code-coverage-php7/src/vendor/phpunit/phpunit/src/Framework/TestSuite.php(722): PHPUnit_Framework_TestSuite->run(PHPUnit_Framework_TestResult)
#13 /srv/jenkins-workspace/workspace/mediawiki-core-code-coverage-php7/src/vendor/phpunit/phpunit/src/Framework/TestSuite.php(722): PHPUnit_Framework_TestSuite->run(PHPUnit_Framework_TestResult)
#14 /srv/jenkins-workspace/workspace/mediawiki-core-code-coverage-php7/src/vendor/phpunit/phpunit/src/TextUI/TestRunner.php(440): PHPUnit_Framework_TestSuite->run(PHPUnit_Framework_TestResult)
#15 /srv/jenkins-workspace/workspace/mediawiki-core-code-coverage-php7/src/vendor/phpunit/phpunit/src/TextUI/Command.php(149): PHPUnit_TextUI_TestRunner->doRun(PHPUnit_Framework_TestSuite, array)
#16 /srv/jenkins-workspace/workspace/mediawiki-core-code-coverage-php7/src/vendor/phpunit/phpunit/src/TextUI/Command.php(100): PHPUnit_TextUI_Command->run(array, boolean)
#17 /srv/jenkins-workspace/workspace/mediawiki-core-code-coverage-php7/src/tests/phpunit/phpunit.php(134): PHPUnit_TextUI_Command::main()
#18 /srv/jenkins-workspace/workspace/mediawiki-core-code-coverage-php7/src/maintenance/doMaintenance.php(94): PHPUnitMaintClass->execute()
#19 /srv/jenkins-workspace/workspace/mediawiki-core-code-coverage-php7/src/tests/phpunit/phpunit.php(173): require(string)
#20 {main}

Event Timeline

Legoktm triaged this task as High priority.Apr 10 2018, 1:30 AM
Legoktm created this task.
Restricted Application added projects: Discovery, Discovery-Search. · View Herald TranscriptApr 10 2018, 1:30 AM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Krinkle renamed this task from Coverage PHPUnit is failing (sqlite): Error: 1 no such table: unittest_page to Coverage PHP jobs failing "no such table: unittest_page".Apr 10 2018, 2:01 PM
Krinkle renamed this task from Coverage PHP jobs failing "no such table: unittest_page" to SearchEngineTest fails during PHPUnit coverage job "no such table: unittest_page".
Krinkle updated the task description. (Show Details)
Krinkle moved this task from Inbox to PHPUnit on the MediaWiki-Core-Testing board.
EBjune added a subscriber: dcausse.Apr 10 2018, 2:57 PM
EBjune added a subscriber: EBjune.Apr 10 2018, 6:40 PM

We discussed this in Search Platform, determined that it looks like someone changed a job somewhere (we don't know where) that's clearly trying to write to a table that is not there (nothing we're familiar with), but do let us know if something is discovered during investigations into core tests that we can help with.

Anomie added a subscriber: aaron.Apr 10 2018, 7:04 PM

Looking at the end of https://integration.wikimedia.org/ci/job/mediawiki-core-code-coverage-php7/189/artifact/log/mw-debug-cli.log:

[objectcache] Rejected set() for jenkins_u0_mw:page-restrictions:1478952412:1811 due to pending writes.
Title::getRestrictionTypes: applicable restrictions to [[Not Main Page]] are {edit,move}
User::getBlockedStatus: checking...
WikiPage::doEditUpdates: Using prepared edit...
Saved in parser cache with key jenkins_u0_mw-unittest_:pcache:idhash:1478952412-0!canonical and timestamp 20180406033837 and revision id 1811
BacklinkCache::queryLinks: got results from DB
Title::getRestrictionTypes: applicable restrictions to [[Not Main Page]] are {edit,move}
[DBConnection] Wikimedia\Rdbms\Database::query: lost connection to [Null]; reconnected
[DBQuery] SQL ERROR: database schema has changed

[squid] CdnCacheUpdate::purge: http://localhost/wiki/index.php/Talk:Not_Main_Page http://localhost/wiki/index.php?title=Talk:Not_Main_Page&action=history
[squid] CdnCacheUpdate::purge: http://localhost/wiki/index.php/Not_Main_Page http://localhost/wiki/index.php?title=Not_Main_Page&action=history
[DBQuery] SQL ERROR: no such table: unittest_site_stats

[DBQuery] SQL ERROR: no such table: unittest_page

[DBConnection] Wikimedia\Rdbms\{closure}: closing connection to database ''.

Something or other is making sqlite raise a "database schema has changed" error, which makes MediaWiki drop the DB connection and reconnect, which means all the temporary tables the testing framework had set up are gone.

I don't see anything obvious that would cause that between the last successful run (on rMW60cc82da01c7: Merge "Swap isset by null check in ApiSetNotificationTimestamp/ApiPurge") and the first failing run (on rMWe9553f6ce9db: Merge "tests: Reset Postgres sequences when cloning and truncating"). Any chance someone can reproduce and bisect?

I was able to reproduce using quibble, and here's the bisect result:

395462b7d5e1384f8c8c8df4c1ef7ec6e9fdc573 is the first bad commit
commit 395462b7d5e1384f8c8c8df4c1ef7ec6e9fdc573
Author: Brad Jorsch <bjorsch@wikimedia.org>
Date:   Thu Apr 5 14:16:03 2018 -0400

    rdbms: Roll back empty implicit transaction on error
    
    If we're not going to set trxStatus to an error state in this case, we
    need to issue a rollback to be sure the database (i.e. PostgreSQL) isn't
    still in an error state too.
    
    Bug: T189999
    Change-Id: Id6e203b216fff937b6a97d779b36c278e3366409

This turned out to be rather complicated.

LoadBalancerTest::testWithoutReplica() and LoadBalancerTest::testWithReplica() re-open the sqlite database and drop/create a non-temporary test table on that connection, and then try to rollback the DBO_TRX transaction that would include the temporary table.

Before rMW395462b7d5e1: rdbms: Roll back empty implicit transaction on error, the tests do these SQL statements (when the table doesn't already exist from a previous test run):

-- $db->dropTable() calls $db->tableExists() which results in DBO_TRX starting a transaction
-- because sqlite happens to do the test by doing a SELECT and seeing if that results in an error.
BEGIN
SELECT 1 FROM unittest_some_table LIMIT 1
-- Logged: SQL ERROR (ignored): no such table: unittest_some_table

-- $db->query() to create the test table
CREATE TABLE unittest_some_table (id INT, time INT)

-- $db->query() to attempt a delete
DELETE FROM unittest_some_table WHERE id=57634126

-- $db->query() to drop the test table
DROP TABLE unittest_some_table

-- $db->rollback( __METHOD__, 'flush' ) at the end
ROLLBACK

I guess, since the whole thing gets rolled back, it winds up not writing anything to the sqlite db file and the main connection never knows about it.

After rMW395462b7d5e1: rdbms: Roll back empty implicit transaction on error, the tests do this instead:

-- $db->dropTable() calls $db->tableExists(), which now rolls back the DBO_TRX transaction due to the error.
BEGIN
SELECT 1 FROM unittest_some_table LIMIT 1
ROLLBACK
-- Logged: SQL ERROR (ignored): no such table: unittest_some_table

-- $db->query() to create the test table. This doesn't make DBO_TRX start a transaction,
-- because MySQL (and Oracle) has DDL statements autocommit transactions.
CREATE TABLE unittest_some_table (id INT, time INT)

-- $db->query() to attempt a delete. This does make DBO_TRX start a new transaction!
BEGIN
DELETE FROM unittest_some_table WHERE id=57634126

-- $db->query() to drop the test table
DROP TABLE unittest_some_table

-- $db->rollback( __METHOD__, 'flush' ) at the end
ROLLBACK

The CREATE TABLE winds up being done outside of any transaction, so it remains in the DB and the main connection raises the "schema changed" error when it eventually notices and reconnects.

Change 426120 had a related patch set uploaded (by Anomie; owner: Anomie):
[mediawiki/core@master] LoadBalancerTest: Clean up transaction handling for sqlite

https://gerrit.wikimedia.org/r/426120

Change 426120 merged by jenkins-bot:
[mediawiki/core@master] LoadBalancerTest: Clean up transaction handling for sqlite

https://gerrit.wikimedia.org/r/426120

Jdforrester-WMF closed this task as Resolved.Apr 13 2018, 11:10 PM
Jdforrester-WMF assigned this task to Anomie.
mmodell changed the subtype of this task from "Task" to "Production Error".Wed, Aug 28, 11:09 PM