Page MenuHomePhabricator

Random test failure: DatabaseMysqlTest::testQueryTimeout: No DBQueryTimeoutError caught
Closed, ResolvedPublic

Description

This gate-and-submit build (for this backport) failed in CI:

There was 1 failure:

1) DatabaseMysqlTest::testQueryTimeout
No DBQueryTimeoutError caught

/workspace/src/tests/phpunit/integration/includes/db/DatabaseMysqlTest.php:56
phpvfscomposer:///workspace/src/vendor/phpunit/phpunit/phpunit:106

Anyone got an idea why? It looks harmless enough to me…

	/**
	 * @covers \Wikimedia\Rdbms\Database::query()
	 */
	public function testQueryTimeout() {
		try {
			$this->conn->query(
				'SET STATEMENT max_statement_time=0.001 FOR SELECT sleep(1) FROM dual',
				__METHOD__
			);
			$this->fail( "No DBQueryTimeoutError caught" );
		} catch ( DBQueryTimeoutError $e ) {
			$this->assertInstanceOf( DBQueryTimeoutError::class, $e );
		}

		$row = $this->conn->query( 'SELECT "x" AS v', __METHOD__ )->fetchObject();
		$this->assertSame( 'x', $row->v, "Still connected/usable" );
	}

Event Timeline

(I see a few other minor issues with this test – AFAIK max_statement_time is specific to MariaDB, and also the $this->assertInstanceOf() on the same class declared in the catch clause seems pointless – but those shouldn’t be related to the test failure.)

Experienced this again in https://integration.wikimedia.org/ci/job/mediawiki-quibble-vendor-mysql-php83/1838/console#console-section-6:

14:42:44 There was 1 failure:
14:42:44 
14:42:44 1) DatabaseMysqlTest::testQueryTimeout
14:42:44 No DBQueryTimeoutError caught
14:42:44 
14:42:44 /workspace/src/tests/phpunit/integration/includes/db/DatabaseMysqlTest.php:58

Change #1056950 had a related patch set uploaded (by Lucas Werkmeister (WMDE); author: Lucas Werkmeister (WMDE)):

[mediawiki/core@master] rdbms: Increase sleep in test

https://gerrit.wikimedia.org/r/1056950

Change #1056950 merged by jenkins-bot:

[mediawiki/core@master] rdbms: Increase sleep in test

https://gerrit.wikimedia.org/r/1056950

The above fix doesn't seem to have worked (https://integration.wikimedia.org/ci/job/mediawiki-quibble-vendor-mysql-php82/2024/console):

11:23:51 
11:23:51 There was 1 failure:
11:23:51 
11:23:51 1) DatabaseMysqlTest::testQueryTimeout
11:23:51 No DBQueryTimeoutError caught
11:23:51 
11:23:51 /workspace/src/tests/phpunit/integration/includes/db/DatabaseMysqlTest.php:58

mw-debug-cli.log extract:

[PHPUnit] Start test DatabaseMysqlTest::testQueryTimeout
[rdbms] Wikimedia\Rdbms\LoadBalancer::reuseOrOpenConnectionForNewRef: reusing connection for 0/wikidb-unittest_
[objectcache] MainWANObjectCache using store Wikimedia\ObjectCache\HashBagOStuff
[session] SessionBackend "r6of7kg343thvq3n5uddfu1e3pr99u2r" is unsaved, marking dirty in constructor
[session] SessionBackend "r6of7kg343thvq3n5uddfu1e3pr99u2r" save: dataDirty=1 metaDirty=1 forcePersist=0
[session] SessionBackend "r6of7kg343thvq3n5uddfu1e3pr99u2r" data dirty due to dirty(): MediaWiki\Session\SessionManager->getSessionForRequest/MediaWiki\Session\SessionManager->getInitialSession/MediaWiki\Session\Session->getToken/MediaWiki\Session\Session->set/MediaWiki\Session\SessionBackend->dirty
[session] SessionBackend "r6of7kg343thvq3n5uddfu1e3pr99u2r" save: dataDirty=1 metaDirty=0 forcePersist=0
[PHPUnit] FAILURE in test DatabaseMysqlTest::testQueryTimeout: No DBQueryTimeoutError caught

Looking at the associated timestamps for the tests it seems that the line took more than 10s to run:

11:23:33 ...................F................SSSSSSSSS.............. 13157 / 13499 ( 97%)
11:23:45 ........................................................... 13216 / 13499 ( 97%)

Dangit. Thanks for digging that up.

I found MDEV-10529, which suggests that sleep() + max_statement_time is a fragile combination – according to that bug report, and the sleep docs, it’s in fact expected that a query with sleep() and nothing else will complete without error (and see sleep() return 1, rather than 0 as it would when not interrupted), yet when I try it out I always seem to get the “max_statement_time exceeded” error. So my guess is that there’s a bug in MariaDB which causes the sleep() to only return 1 very rarely, but without that bug this test would always fail. (The test was added in 2022 btw.)

Change #1059302 had a related patch set uploaded (by Lucas Werkmeister (WMDE); author: Lucas Werkmeister (WMDE)):

[mediawiki/core@master] rdbms: Allow interrupted sleep() to return 1

https://gerrit.wikimedia.org/r/1059302

Change #1059302 merged by jenkins-bot:

[mediawiki/core@master] rdbms: Allow interrupted sleep() to return 1

https://gerrit.wikimedia.org/r/1059302

Errored again:

13:34:56 1) DatabaseMysqlTest::testQueryTimeout
13:34:56 Failed asserting that '1' is identical to 1.
13:34:56 
13:34:56 /workspace/src/tests/phpunit/integration/includes/db/DatabaseMysqlTest.php:61

Change #1061083 had a related patch set uploaded (by Dreamy Jazz; author: Dreamy Jazz):

[mediawiki/core@master] Fix inconsistently failing testQueryTimeout test

https://gerrit.wikimedia.org/r/1061083

Change #1061083 merged by jenkins-bot:

[mediawiki/core@master] Fix inconsistently failing testQueryTimeout test

https://gerrit.wikimedia.org/r/1061083

Dreamy_Jazz assigned this task to Lucas_Werkmeister_WMDE.

Tentatively resolving this as mitigations for this have been implemented.