Page MenuHomePhabricator

Possible uptick in "DBTransactionSizeError: Transaction spent [n] second(s) in writes, exceeding the limit of 3"
Closed, DuplicatePublicPRODUCTION ERROR

Description

Error
normalized_message
[{reqId}] {exception_url}   Wikimedia\Rdbms\DBTransactionSizeError: Transaction spent 5.9580748081207 second(s) in writes, exceeding the limit of 3
exception.trace
from /srv/mediawiki/php-1.37.0-wmf.4/includes/libs/rdbms/loadbalancer/LoadBalancer.php(1750)
#0 /srv/mediawiki/php-1.37.0-wmf.4/includes/libs/rdbms/loadbalancer/LoadBalancer.php(2253): Wikimedia\Rdbms\LoadBalancer::Wikimedia\Rdbms\{closure}(Wikimedia\Rdbms\DatabaseMysqli)
#1 /srv/mediawiki/php-1.37.0-wmf.4/includes/libs/rdbms/loadbalancer/LoadBalancer.php(1764): Wikimedia\Rdbms\LoadBalancer->forEachOpenMasterConnection(Closure)
#2 /srv/mediawiki/php-1.37.0-wmf.4/includes/libs/rdbms/lbfactory/LBFactory.php(249): Wikimedia\Rdbms\LoadBalancer->approveMasterChanges(array, string, integer)
#3 /srv/mediawiki/php-1.37.0-wmf.4/includes/libs/rdbms/lbfactory/LBFactoryMulti.php(236): Wikimedia\Rdbms\LBFactory::Wikimedia\Rdbms\{closure}(Wikimedia\Rdbms\LoadBalancer, string, array)
#4 /srv/mediawiki/php-1.37.0-wmf.4/includes/libs/rdbms/lbfactory/LBFactory.php(251): Wikimedia\Rdbms\LBFactoryMulti->forEachLB(Closure, array)
#5 /srv/mediawiki/php-1.37.0-wmf.4/includes/libs/rdbms/lbfactory/LBFactory.php(310): Wikimedia\Rdbms\LBFactory->forEachLBCallMethod(string, array)
#6 /srv/mediawiki/php-1.37.0-wmf.4/includes/MediaWiki.php(675): Wikimedia\Rdbms\LBFactory->commitMasterChanges(string, array)
#7 /srv/mediawiki/php-1.37.0-wmf.4/includes/api/ApiMain.php(664): MediaWiki::preOutputCommit(DerivativeContext)
#8 /srv/mediawiki/php-1.37.0-wmf.4/includes/api/ApiMain.php(609): ApiMain->executeActionWithErrorHandling()
#9 /srv/mediawiki/php-1.37.0-wmf.4/api.php(90): ApiMain->execute()
#10 /srv/mediawiki/php-1.37.0-wmf.4/api.php(45): wfApiMain()
#11 /srv/mediawiki/w/api.php(3): require(string)
#12 {main}
Notes

Seemed like I was seeing an unusual number of these while doing log triage for 1.37.0-wmf.4. Looking at logstash, there's an apparent uptick starting in late April 2021:

Screenshot-2021-05-06-12:06:30.png (251×674 px, 20 KB)

No idea if these are just a product of database weather or a result of some code change, but it seemed worth surfacing. Most are on commons, some on enwiki.

Details

Request URL
https://commons.wikimedia.org/w/api.php?format=*&maxlag=*&action=upload

Event Timeline

Marostegui subscribed.

We (DBAs) cannot do anything about it, the transaction took longer than expected (it could be a punctual issue, it could be code issue) and then it was rolled back.
I don't see anything wrong with the database layer at this point. https://logstash.wikimedia.org/goto/343f8796c3282c67d115f82352443e16
Removing the DBA tag for now, but I will stay subscribed to the task in case I am needed

There are also other issues on action=upload with big files, see T278389 or T280926, possible all related

BPirkle triaged this task as Medium priority.May 11 2021, 7:13 PM
BPirkle subscribed.

Moving to Clinic Duty for investigation. I triaged as Medium priority. Feel free to bump it to High if you think that is warranted.