Page MenuHomePhabricator

Wikibase quibble api tests failing on an empty change
Closed, ResolvedPublic

Description

When running tests on an empty change, the quibble-apitests-only-vendor-php81 job results in failure.

console logs for the job: https://integration.wikimedia.org/ci/job/quibble-apitests-only-vendor-php81/1729/console

13:43:58   1) resource too large
13:43:58        resource (item) is too large:
13:43:58 
13:43:58       AssertionError: 
13:43:58 7Response body: {}8
13:43:58 Invalid status: expected 500 to equal 400      
13:43:58       actual expected
13:43:58       
13:43:58       500400
13:43:58       
13:43:58       at Proxy.<anonymous> (tests/mocha/helpers/chaiHelper.js:91:8)
13:43:58       at Proxy.chainableMethodWrapper (node_modules/chai/lib/chai/utils/addChainableMethod.js:113:49)
13:43:58       at assertValidError (/workspace/src/extensions/Wikibase/repo/domains/crud/tests/mocha/helpers/responseValidator.js:8:30)
13:43:58       at assertResourceTooLargeResponse (/workspace/src/extensions/Wikibase/repo/domains/crud/tests/mocha/api-testing/ResourceTooLargeTest.js:13:2)
13:43:58       at Context.<anonymous> (/workspace/src/extensions/Wikibase/repo/domains/crud/tests/mocha/api-testing/ResourceTooLargeTest.js:55:3)
13:43:58       at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
13:43:58 
13:43:58   2) resource too large
13:43:58        resource (property) is too large:
13:43:58 
13:43:58       AssertionError: 
13:43:58 7Response body: {}8
13:43:58 Invalid status: expected 500 to equal 400      
13:43:58       actual expected
13:43:58       
13:43:58       500400
13:43:58       
13:43:58       at Proxy.<anonymous> (tests/mocha/helpers/chaiHelper.js:91:8)
13:43:58       at Proxy.chainableMethodWrapper (node_modules/chai/lib/chai/utils/addChainableMethod.js:113:49)
13:43:58       at assertValidError (/workspace/src/extensions/Wikibase/repo/domains/crud/tests/mocha/helpers/responseValidator.js:8:30)
13:43:58       at assertResourceTooLargeResponse (/workspace/src/extensions/Wikibase/repo/domains/crud/tests/mocha/api-testing/ResourceTooLargeTest.js:13:2)
13:43:58       at Context.<anonymous> (/workspace/src/extensions/Wikibase/repo/domains/crud/tests/mocha/api-testing/ResourceTooLargeTest.js:67:3)
13:43:58       at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
13:43:58 
13:43:58 
13:43:58 
13:43:58 ERROR: "api-testing:e2e-crud" exited with 2.
13:43:59 ERROR: "api-testing:rest-api" exited with 1.
13:43:59 INFO:quibble.commands:<<< Finish: Run API-Testing, in 362.032 s
13:43:59 DEBUG:quibble.cmd:No quibble.yaml in /workspace/src/extensions/Wikibase
13:43:59 INFO:backend.ChromeWebDriver:Terminating ChromeWebDriver
13:43:59 INFO:backend.Xvfb:Terminating Xvfb
13:43:59 INFO:backend.MySQL:Terminating MySQL
13:43:59 Traceback (most recent call last):
13:43:59   File "/usr/local/bin/quibble", line 7, in <module>
13:43:59     sys.exit(main())
13:43:59   File "/usr/local/lib/python3.9/dist-packages/quibble/cmd.py", line 997, in main
13:43:59     cmd.execute(
13:43:59   File "/usr/local/lib/python3.9/dist-packages/quibble/cmd.py", line 647, in execute
13:43:59     raise called_process_error
13:43:59   File "/usr/local/lib/python3.9/dist-packages/quibble/cmd.py", line 633, in execute
13:43:59     quibble.commands.execute_command(command)
13:43:59   File "/usr/local/lib/python3.9/dist-packages/quibble/commands.py", line 33, in execute_command
13:43:59     command.execute()
13:43:59   File "/usr/local/lib/python3.9/dist-packages/quibble/commands.py", line 1129, in execute
13:43:59     run(
13:43:59   File "/usr/local/lib/python3.9/dist-packages/quibble/commands.py", line 76, in run
13:43:59     raise subprocess.CalledProcessError(
13:43:59 subprocess.CalledProcessError: Command '['npm', 'run', 'api-testing']' returned non-zero exit status 1.

Event Timeline

The mw-error.log for that build shows:

2025-09-30 11:40:47 956cb3e16e6d wikidb: [53d2b8871a415bf5cf6b538c] /rest.php/wikibase/v1/entities/items/Q98   Wikimedia\Rdbms\DBTransactionError: Explicit transaction still active; a caller might have failed to call endAtomic() or cancelAtomic().
#0 /workspace/src/includes/libs/rdbms/lbfactory/LBFactory.php(335): Wikimedia\Rdbms\LoadBalancer->approvePrimaryChanges(int, string)
#1 /workspace/src/includes/MediaWikiEntryPoint.php(294): Wikimedia\Rdbms\LBFactory->commitPrimaryChanges(string, int)
#2 /workspace/src/includes/MediaWikiEntryPoint.php(187): MediaWiki\MediaWikiEntryPoint->commitMainTransaction()
#3 /workspace/src/includes/MediaWikiEntryPoint.php(170): MediaWiki\MediaWikiEntryPoint->doPrepareForOutput()
#4 /workspace/src/includes/Rest/EntryPoint.php(236): MediaWiki\MediaWikiEntryPoint->prepareForOutput()
#5 /workspace/src/includes/MediaWikiEntryPoint.php(198): MediaWiki\Rest\EntryPoint->execute()
#6 /workspace/src/rest.php(39): MediaWiki\MediaWikiEntryPoint->run()
#7 {main}
2025-09-30 11:40:48 956cb3e16e6d wikidb: [7cdb6a771fd277fcb728ba20] /rest.php/wikibase/v1/entities/properties/P78   Wikimedia\Rdbms\DBTransactionError: Explicit transaction still active; a caller might have failed to call endAtomic() or cancelAtomic().
#0 /workspace/src/includes/libs/rdbms/lbfactory/LBFactory.php(335): Wikimedia\Rdbms\LoadBalancer->approvePrimaryChanges(int, string)
#1 /workspace/src/includes/MediaWikiEntryPoint.php(294): Wikimedia\Rdbms\LBFactory->commitPrimaryChanges(string, int)
#2 /workspace/src/includes/MediaWikiEntryPoint.php(187): MediaWiki\MediaWikiEntryPoint->commitMainTransaction()
#3 /workspace/src/includes/MediaWikiEntryPoint.php(170): MediaWiki\MediaWikiEntryPoint->doPrepareForOutput()
#4 /workspace/src/includes/Rest/EntryPoint.php(236): MediaWiki\MediaWikiEntryPoint->prepareForOutput()
#5 /workspace/src/includes/MediaWikiEntryPoint.php(198): MediaWiki\Rest\EntryPoint->execute()
#6 /workspace/src/rest.php(39): MediaWiki\MediaWikiEntryPoint->run()
#7 {main}
Lucas_Werkmeister_WMDE renamed this task from quibble api tests failing on an empty change to Wikibase quibble api tests failing on an empty change.Sep 30 2025, 12:40 PM
Lucas_Werkmeister_WMDE assigned this task to Jakob_WMDE.
Lucas_Werkmeister_WMDE added a subscriber: Jakob_WMDE.

(@Jakob_WMDE is looking into this at the moment.)

I wonder if Migrate pagelinks to virtual domain could be related… it was merged recently in core, and it’s at least related to the database.

I was able to reproduce it locally and found that https://gerrit.wikimedia.org/r/c/mediawiki/core/+/1191321 is the commit that causes the test to fail. @Ladsgroup any immediate ideas what's happening there?

Marking as a train blocker for next week out of caution, since it sounds like the DBTransactionError would also be possible outside of the tests.

I was able to reproduce it locally and found that https://gerrit.wikimedia.org/r/c/mediawiki/core/+/1191321 is the commit that causes the test to fail. @Ladsgroup any immediate ideas what's happening there?

Maybe I am doing something wrong, but in https://gerrit.wikimedia.org/r/c/mediawiki/extensions/Wikibase/+/1192548 the test case still fails if it depends on a revert of that patch.

Maybe I am doing something wrong, but in https://gerrit.wikimedia.org/r/c/mediawiki/extensions/Wikibase/+/1192548 the test case still fails if it depends on a revert of that patch.

The test mentioned in the task description here passes on the patch you've linked. There is a different kind of error there, interestingly. Looks like all tests are passing but a deadlock error is logged and causes the job to fail. I think I've seen that before, too, but it's unrelated to the issue here.

The deadlock error seems to be tracked at T375721: Flaky api-testing: database deadlocks FWIW (and yes, it happens from time to time and is unrelated).

I’ve tried to look at the mw-debug-web.log.gz log, but I’m getting the strong sense that log messages from multiple concurrent requests are interleaved in there, so it seems borderline impossible to make sense of the messages (e.g. “startAtomic: entering level n” and “endAtomic: leaving level n”).

@Zabe, @Ladsgroup: any idea how that patch could possibly cause the transaction errors seen in T406027#11229184? I’m at a loss. (But I checked a few other broken builds and the same errors show up there, so it doesn’t look like the transaction error is just an unrelated flakiness or something.)

Change #1192582 had a related patch set uploaded (by Lucas Werkmeister (WMDE); author: Lucas Werkmeister (WMDE)):

[mediawiki/core@master] Revert "RevisionStore: Correctly respect the search limit for identical revs"

https://gerrit.wikimedia.org/r/1192582

I’ve tried to look at the mw-debug-web.log.gz log, but I’m getting the strong sense that log messages from multiple concurrent requests are interleaved in there, so it seems borderline impossible to make sense of the messages (e.g. “startAtomic: entering level n” and “endAtomic: leaving level n”).

@Zabe, @Ladsgroup: any idea how that patch could possibly cause the transaction errors seen in T406027#11229184? I’m at a loss. (But I checked a few other broken builds and the same errors show up there, so it doesn’t look like the transaction error is just an unrelated flakiness or something.)

From a semi-brief look, I also have no idea how the two select queries in that patch caused this. I +2'ed the revert for now in order to unbreak wikibase CI.

Change #1192582 merged by jenkins-bot:

[mediawiki/core@master] Revert "RevisionStore: Correctly respect the search limit for identical revs"

https://gerrit.wikimedia.org/r/1192582

@Silvan_WMDE noted that the too-big content would previously probably have thrown an EntityContentTooBigException in the $slot->getSha1() call, but now the method returns early before calling this method if no candidate revisions are found. (So I guess this would be limited only to brand-new items that only have one revision, where the candidate revisions are empty due to the OFFSET 1?)

Change #1192841 had a related patch set uploaded (by Lucas Werkmeister (WMDE); author: Lucas Werkmeister (WMDE)):

[mediawiki/core@master] Reapply "RevisionStore: Correctly respect the search limit for identical revs"

https://gerrit.wikimedia.org/r/1192841

Change #1192841 had a related patch set uploaded (by Lucas Werkmeister (WMDE); author: Lucas Werkmeister (WMDE)):

[mediawiki/core@master] Reapply "RevisionStore: Correctly respect the search limit for identical revs"

https://gerrit.wikimedia.org/r/1192841

Let’s see if moving the $slot->getSha1() call before the early return helps; I’m testing CI in this Wikibase change.

That seems to help; the “is too large” tests are green in this build.

I’m guessing there’s some other bug, either in MediaWiki core or in Wikibase, where $slot->getSha1() throwing an EntityContentTooBigException causes a transaction to not be canceled properly; and this bug is masked if findIdenticalRevision() already throws the same exception earlier (perhaps even before the transaction begins).

Not sure if Wikibase is wrong to throw the exception, or if MediaWiki core is handling it wrongly. But anyway, that reapply change seems to work for now.

Change #1192841 merged by jenkins-bot:

[mediawiki/core@master] Reapply "RevisionStore: Correctly respect the search limit for identical revs"

https://gerrit.wikimedia.org/r/1192841

jnuche subscribed.

Removed from T405678 since a fix was merged.

Thanks everyone for working on this!

@jnuche it’s probably worth watching out for a recurrence of T406208 in wmf.22, just in case :) we fixed that for wmf.22 by reverting the original RevisionStore sha1 update, so IIRC the revised query which was merged in here has not yet been tested in production.

@Lucas_Werkmeister_WMDE Reading T406208 it looks like I can check once the train gets to the English wikipedia on Thurs. Do you happen to know if/how that could be tested on a different wiki before then?

No idea, sorry. If I understand correctly, T406208 would only affect specific pages with a certain history pattern, but I don’t know what exactly the problematic pattern is nor how to find other pages that might be affected.

The three pages linked in the task description over there seem to be edited fairly frequently, so it’s probably enough to check that they’re still receiving edits, say, about an hour after rolling the train forward to enwiki. (Or just see if any complaints come in.)

[...]
Not sure if Wikibase is wrong to throw the exception, or if MediaWiki core is handling it wrongly. But anyway, that reapply change seems to work for now.

I am also not sure, but I do think it should not rely on findIdenticalRevision.