Page MenuHomePhabricator

Wikimedia\Rdbms\DBTransactionError: Transaction round stage must be 'cursory' (not 'within-rollback-session')
Closed, DuplicatePublicPRODUCTION ERROR

Description

Error
normalized_message
[{reqId}] {exception_url}   Wikimedia\Rdbms\DBTransactionError: Transaction round stage must be 'cursory' (not 'within-rollback-session')
exception.trace
from /srv/mediawiki/php-1.38.0-wmf.26/includes/libs/rdbms/lbfactory/LBFactory.php(829)
#0 /srv/mediawiki/php-1.38.0-wmf.26/includes/libs/rdbms/lbfactory/LBFactory.php(291): Wikimedia\Rdbms\LBFactory->assertTransactionRoundStage(string)
#1 /srv/mediawiki/php-1.38.0-wmf.26/includes/MediaWiki.php(678): Wikimedia\Rdbms\LBFactory->commitPrimaryChanges(string, array)
#2 /srv/mediawiki/php-1.38.0-wmf.26/includes/api/ApiMain.php(896): MediaWiki::preOutputCommit(DerivativeContext)
#3 /srv/mediawiki/php-1.38.0-wmf.26/includes/api/ApiMain.php(841): ApiMain->executeActionWithErrorHandling()
#4 /srv/mediawiki/php-1.38.0-wmf.26/api.php(90): ApiMain->execute()
#5 /srv/mediawiki/php-1.38.0-wmf.26/api.php(45): wfApiMain()
#6 /srv/mediawiki/w/api.php(3): require(string)
#7 {main}
Impact

Train rolled to group0, 24 errors in the last 30 min

Notes

Event Timeline

Caused by the same thing https://gerrit.wikimedia.org/r/c/mediawiki/core/+/737824/

I'm inclined to revert that patch and follow ups

I've told the author of that patch before and this is not the first time. Making really large patches like this, in critical part of infrastructure is bound to break. Specially when a patch that supposed to do one thing has a dedicated section in commit message with "Also" and then doing at least eight more things on top. These type of patches are a recipe for disaster and easily can cause data corruption.

Caused by the same thing https://gerrit.wikimedia.org/r/c/mediawiki/core/+/737824/

I'm inclined to revert that patch and follow ups

There is a patch for review to fix this I think: https://gerrit.wikimedia.org/r/c/mediawiki/core/+/770935/, but revert could also be done.

I suggest deploying that fix, if that fixes everything and no more errors, sure. If we find more errors, then I revert the whole chain.

Change 771081 had a related patch set uploaded (by Aaron Schulz; author: Aaron Schulz):

[mediawiki/core@master] rdbms: use the LoadBalancer id in flushPrimarySessions()

https://gerrit.wikimedia.org/r/771081

jeena triaged this task as Unbreak Now! priority.Mar 16 2022, 5:12 PM

Change 771081 merged by jenkins-bot:

[mediawiki/core@master] rdbms: fix owner id and RELEASE_ALL_LOCKS query in session flushing methods

https://gerrit.wikimedia.org/r/771081

Change 770938 had a related patch set uploaded (by Aaron Schulz; author: Aaron Schulz):

[mediawiki/core@wmf/1.38.0-wmf.26] rdbms: fix owner id and RELEASE_ALL_LOCKS query in session flushing methods

https://gerrit.wikimedia.org/r/770938

Ladsgroup claimed this task.

Reverted the whole thing. Please do not re-apply the patch without major rethinking of how it should be done and how it should be deployed. I try to find a place to write an essay about this.

Change 770938 abandoned by Aaron Schulz:

[mediawiki/core@wmf/1.38.0-wmf.26] rdbms: fix owner id and RELEASE_ALL_LOCKS query in session flushing methods

Reason:

https://gerrit.wikimedia.org/r/770938