Page MenuHomePhabricator

Wikimedia\Rdbms\DBTransactionSizeError: Transaction spent 3.6s in writes, exceeding the 3s limit
Closed, ResolvedPublicPRODUCTION ERROR

Description

Error
normalized_message
[{reqId}] {exception_url}   Wikimedia\Rdbms\DBTransactionSizeError: Transaction spent 3.6s in writes, exceeding the 3s limit
exception.trace
from /srv/mediawiki/php-1.38.0-wmf.19/includes/libs/rdbms/loadbalancer/LoadBalancer.php(1724)
#0 /srv/mediawiki/php-1.38.0-wmf.19/includes/libs/rdbms/loadbalancer/LoadBalancer.php(2241): Wikimedia\Rdbms\LoadBalancer->Wikimedia\Rdbms\{closure}(Wikimedia\Rdbms\DatabaseMysqli)
#1 /srv/mediawiki/php-1.38.0-wmf.19/includes/libs/rdbms/loadbalancer/LoadBalancer.php(1747): Wikimedia\Rdbms\LoadBalancer->forEachOpenPrimaryConnection(Closure)
#2 /srv/mediawiki/php-1.38.0-wmf.19/includes/libs/rdbms/lbfactory/LBFactory.php(249): Wikimedia\Rdbms\LoadBalancer->approvePrimaryChanges(array, string, integer)
#3 /srv/mediawiki/php-1.38.0-wmf.19/includes/libs/rdbms/lbfactory/LBFactoryMulti.php(230): Wikimedia\Rdbms\LBFactory::Wikimedia\Rdbms\{closure}(Wikimedia\Rdbms\LoadBalancer, string, array)
#4 /srv/mediawiki/php-1.38.0-wmf.19/includes/libs/rdbms/lbfactory/LBFactory.php(251): Wikimedia\Rdbms\LBFactoryMulti->forEachLB(Closure, array)
#5 /srv/mediawiki/php-1.38.0-wmf.19/includes/libs/rdbms/lbfactory/LBFactory.php(310): Wikimedia\Rdbms\LBFactory->forEachLBCallMethod(string, array)
#6 /srv/mediawiki/php-1.38.0-wmf.19/includes/MediaWiki.php(677): Wikimedia\Rdbms\LBFactory->commitPrimaryChanges(string, array)
#7 /srv/mediawiki/php-1.38.0-wmf.19/includes/MediaWiki.php(647): MediaWiki::preOutputCommit(RequestContext, Closure)
#8 /srv/mediawiki/php-1.38.0-wmf.19/includes/MediaWiki.php(926): MediaWiki->doPreOutputCommit(Closure)
#9 /srv/mediawiki/php-1.38.0-wmf.19/includes/MediaWiki.php(563): MediaWiki->main()
#10 /srv/mediawiki/php-1.38.0-wmf.19/index.php(53): MediaWiki->run()
#11 /srv/mediawiki/php-1.38.0-wmf.19/index.php(46): wfIndexMain()
#12 /srv/mediawiki/w/index.php(3): require(string)
#13 {main}
Impact

Seen multiple occurrences after deploying to group1

Notes

All instances appear to be for categories on Commons.

Event Timeline

brennen triaged this task as Unbreak Now! priority.Jan 26 2022, 8:32 PM
brennen updated the task description. (Show Details)
brennen added a project: User-brennen.

Is this just timing for certain kinds of edits being saved (?) got so much slower that the DB commit self-terminated?

Change 757533 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[mediawiki/core@master] rdbms: Normalize DBTransactionSizeError message

https://gerrit.wikimedia.org/r/757533

This doesn't fix the issue, just fixes the error messages ^

I don't know exactly what but something changed the edit transaction and added PageEditStash to it. Compare these two:
https://logstash.wikimedia.org/app/discover#/doc/logstash-*/logstash-mediawiki-2022.01.26?id=2doLmH4Bj9Ddf3wKnd0H
https://logstash.wikimedia.org/app/discover#/doc/logstash-*/logstash-mediawiki-2022.01.26?id=LBgNmH4B7YMytsAjMJGw

I thought it could be the changes I did to flaggedrevs to make it use edit stash but lots of these logs are in commons where flaggedrevs is not even installed. I dig a bit but it's quite late here, if anyone wants to take a look. That'd be awesome.

The very likely cause is that https://gerrit.wikimedia.org/r/c/mediawiki/core/+/746985/2/includes/libs/rdbms/database/Database.php#1129

It makes the acquiring and removing locks a write query so it would get into the transaction (I admit I'm not super sure).

I suggest reverting that patch and moving forward and see if it continues to happen.

For a longer term solution, I recommend removing db part of PageEditStash, it should lock using the cache, not the database, at worst it can use redis lock manager we use for files.

My first guess would be https://gerrit.wikimedia.org/r/c/mediawiki/core/+/746985.

Edit: Ladsgroup was apparently faster than me

Change 757473 had a related patch set uploaded (by Ladsgroup; author: Ladsgroup):

[mediawiki/core@wmf/1.38.0-wmf.19] Revert \"rdbms: cleanup the use of QUERY_ flags to query() in Database\"

https://gerrit.wikimedia.org/r/757473

Thanks @Ladsgroup. We'll make one more attempt at rolling train forward to group1, assuming it passes.

Mentioned in SAL (#wikimedia-operations) [2022-01-26T23:34:35Z] <brennen> train 1.38.0-wmf.19 (T293960): parking the train at group0 until US morning; we have a probable fix for T300194 but CI is having issues

Change 757556 had a related patch set uploaded (by Tim Starling; author: Tim Starling):

[mediawiki/core@master] Don't consider lock waits to be write queries

https://gerrit.wikimedia.org/r/757556

I reproduced this locally, my fix above seems to be sufficient. I'm pretty sure the intention of Aaron's change was to replace QUERY_IGNORE_DBO_TRX | QUERY_CHANGE_NONE with QUERY_CHANGE_LOCKS purely for brevity, with the same semantics applied.

Change 757476 had a related patch set uploaded (by Ladsgroup; author: Tim Starling):

[mediawiki/core@wmf/1.38.0-wmf.19] Don't consider lock waits to be write queries

https://gerrit.wikimedia.org/r/757476

Change 757473 abandoned by Ladsgroup:

[mediawiki/core@wmf/1.38.0-wmf.19] Revert \"rdbms: cleanup the use of QUERY_ flags to query() in Database\"

Reason:

Proper fix is in place: I29a02c87156a36

https://gerrit.wikimedia.org/r/757473

Change 757556 merged by jenkins-bot:

[mediawiki/core@master] Don't consider lock waits to be write queries

https://gerrit.wikimedia.org/r/757556

Change 757476 merged by jenkins-bot:

[mediawiki/core@wmf/1.38.0-wmf.19] Don't consider lock waits to be write queries

https://gerrit.wikimedia.org/r/757476

Mentioned in SAL (#wikimedia-operations) [2022-01-27T08:13:43Z] <ladsgroup@deploy1002> Synchronized php-1.38.0-wmf.19/includes/libs/rdbms/database/Database.php: Backport: [[gerrit:757476|Don't consider lock waits to be write queries (T300194)]] (duration: 00m 52s)

Ladsgroup assigned this task to tstarling.

Change 757533 merged by jenkins-bot:

[mediawiki/core@master] rdbms: Normalize DBTransactionSizeError message

https://gerrit.wikimedia.org/r/757533