Page MenuHomePhabricator

Fatal exception of type "Wikimedia\Rdbms\DBTransactionSizeError"
Closed, ResolvedPublicPRODUCTION ERROR

Description

Error

086a7eed-5809-4aeb-bcf2-82fb504c0540

  • [[ URL | Logstash single document ]]
message
trace
Impact
Notes

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Also dropping here related request IDs from the help desk:

  • c867a72e-ac5d-4b74-ad11-bd9084c0a12d
  • 3228d684-e259-477d-a359-db71698e5abd
  • bb5ad4fc-e7b2-411b-92a8-78201adada13

It seems like it could have been a temporary hiccup, as 500 responses have plummeted again, but seem to be continuing at a lower level.

Nope, still an issue as of 12:28 PM UTC.

(same error, recent)

Error
normalized_message
[{reqId}] {exception_url}   Wikimedia\Rdbms\DBQueryError: Error 1205: Lock wait timeout exceeded; try restarting transaction
Function: WikiPage::lockAndGetLatest
Query: SELECT  page_latest  FROM `page`    WHERE page_id = 424871 AND page_namespace = 0 AND
exception.trace
from /srv/mediawiki/php-1.42.0-wmf.7/includes/libs/rdbms/database/Database.php(1237)
#0 /srv/mediawiki/php-1.42.0-wmf.7/includes/libs/rdbms/database/Database.php(1221): Wikimedia\Rdbms\Database->getQueryException(string, integer, string, string)
#1 /srv/mediawiki/php-1.42.0-wmf.7/includes/libs/rdbms/database/Database.php(1195): Wikimedia\Rdbms\Database->getQueryExceptionAndLog(string, integer, string, string)
#2 /srv/mediawiki/php-1.42.0-wmf.7/includes/libs/rdbms/database/Database.php(679): Wikimedia\Rdbms\Database->reportQueryError(string, integer, string, string, boolean)
#3 /srv/mediawiki/php-1.42.0-wmf.7/includes/libs/rdbms/database/Database.php(1384): Wikimedia\Rdbms\Database->query(Wikimedia\Rdbms\Query, string)
#4 /srv/mediawiki/php-1.42.0-wmf.7/includes/libs/rdbms/database/Database.php(1335): Wikimedia\Rdbms\Database->select(array, string, array, string, array, array)
#5 /srv/mediawiki/php-1.42.0-wmf.7/includes/libs/rdbms/database/DBConnRef.php(119): Wikimedia\Rdbms\Database->selectField(array, string, array, string, array, array)
#6 /srv/mediawiki/php-1.42.0-wmf.7/includes/libs/rdbms/database/DBConnRef.php(338): Wikimedia\Rdbms\DBConnRef->__call(string, array)
#7 /srv/mediawiki/php-1.42.0-wmf.7/includes/libs/rdbms/querybuilder/SelectQueryBuilder.php(726): Wikimedia\Rdbms\DBConnRef->selectField(array, string, array, string, array, array)
#8 /srv/mediawiki/php-1.42.0-wmf.7/includes/page/WikiPage.php(2703): Wikimedia\Rdbms\SelectQueryBuilder->fetchField()
#9 /srv/mediawiki/php-1.42.0-wmf.7/includes/Storage/PageUpdater.php(1357): WikiPage->lockAndGetLatest()
#10 /srv/mediawiki/php-1.42.0-wmf.7/includes/Storage/PageUpdater.php(922): MediaWiki\Storage\PageUpdater->doModify(MediaWiki\CommentStore\CommentStoreComment)
#11 /srv/mediawiki/php-1.42.0-wmf.7/includes/editpage/EditPage.php(2498): MediaWiki\Storage\PageUpdater->saveRevision(MediaWiki\CommentStore\CommentStoreComment, integer)
#12 /srv/mediawiki/php-1.42.0-wmf.7/includes/editpage/EditPage.php(1800): MediaWiki\EditPage\EditPage->internalAttemptSave(array, boolean, boolean)
#13 /srv/mediawiki/php-1.42.0-wmf.7/includes/editpage/EditPage.php(717): MediaWiki\EditPage\EditPage->attemptSave(array)
#14 /srv/mediawiki/php-1.42.0-wmf.7/includes/actions/EditAction.php(66): MediaWiki\EditPage\EditPage->edit()
#15 /srv/mediawiki/php-1.42.0-wmf.7/includes/actions/SubmitAction.php(38): EditAction->show()
#16 /srv/mediawiki/php-1.42.0-wmf.7/includes/MediaWiki.php(585): SubmitAction->show()
#17 /srv/mediawiki/php-1.42.0-wmf.7/includes/MediaWiki.php(365): MediaWiki->performAction(Article, MediaWiki\Title\Title)
#18 /srv/mediawiki/php-1.42.0-wmf.7/includes/MediaWiki.php(962): MediaWiki->performRequest()
#19 /srv/mediawiki/php-1.42.0-wmf.7/includes/MediaWiki.php(615): MediaWiki->main()
#20 /srv/mediawiki/php-1.42.0-wmf.7/index.php(50): MediaWiki->run()
#21 /srv/mediawiki/php-1.42.0-wmf.7/index.php(46): wfIndexMain()
#22 /srv/mediawiki/w/index.php(3): require(string)
#23 {main}
see also
{"type":"https://mediawiki.org/wiki/HyperSwitch/errors/not_found","title":"Not found.","method":"get","uri":"/en.wikipedia.org/v1/page/html/Buck/1188282069","internalURI":"https://restbase-async.discovery.wmnet:7443/en.wikipedia.org/v1/page/html/Buck/1188282069","internalMethod":"get"}
Impact
Notes

(Potentially related)

On attempting a purge (action=purge&format=json&forcelinkupdate=1&forcerecursivelinkupdate=1&titles=Buck&formatversion=2) to the Buck article on en.wiki:

Error
normalized_message
Exception executing job: refreshLinksPrioritized Buck namespace=0 title=Buck rootJobTimestamp=20231103113626 useRecursiveLinksUpdate=1 triggeringUser={"userId":0,"userName":"182.209.167.241"} triggeringRevisionId=1183291441 causeAction=api-purge causeAgen
exception.trace
from /srv/mediawiki/php-1.42.0-wmf.7/includes/libs/rdbms/database/Database.php(1237)
#0 /srv/mediawiki/php-1.42.0-wmf.7/includes/libs/rdbms/database/Database.php(1221): Wikimedia\Rdbms\Database->getQueryException(string, integer, string, string)
#1 /srv/mediawiki/php-1.42.0-wmf.7/includes/libs/rdbms/database/Database.php(1195): Wikimedia\Rdbms\Database->getQueryExceptionAndLog(string, integer, string, string)
#2 /srv/mediawiki/php-1.42.0-wmf.7/includes/libs/rdbms/database/Database.php(679): Wikimedia\Rdbms\Database->reportQueryError(string, integer, string, string, boolean)
#3 /srv/mediawiki/php-1.42.0-wmf.7/includes/libs/rdbms/database/Database.php(1527): Wikimedia\Rdbms\Database->query(Wikimedia\Rdbms\Query, string)
#4 /srv/mediawiki/php-1.42.0-wmf.7/includes/libs/rdbms/database/DBConnRef.php(119): Wikimedia\Rdbms\Database->update(string, array, array, string, array)
#5 /srv/mediawiki/php-1.42.0-wmf.7/includes/libs/rdbms/database/DBConnRef.php(413): Wikimedia\Rdbms\DBConnRef->__call(string, array)
#6 /srv/mediawiki/php-1.42.0-wmf.7/includes/libs/rdbms/querybuilder/UpdateQueryBuilder.php(332): Wikimedia\Rdbms\DBConnRef->update(string, array, array, string, array)
#7 /srv/mediawiki/php-1.42.0-wmf.7/includes/deferred/LinksUpdate/LinksUpdate.php(533): Wikimedia\Rdbms\UpdateQueryBuilder->execute()
#8 /srv/mediawiki/php-1.42.0-wmf.7/includes/deferred/LinksUpdate/LinksUpdate.php(233): MediaWiki\Deferred\LinksUpdate\LinksUpdate->updateLinksTimestamp()
#9 /srv/mediawiki/php-1.42.0-wmf.7/includes/deferred/LinksUpdate/LinksUpdate.php(183): MediaWiki\Deferred\LinksUpdate\LinksUpdate->doIncrementalUpdate()
#10 /srv/mediawiki/php-1.42.0-wmf.7/includes/deferred/DeferredUpdates.php(469): MediaWiki\Deferred\LinksUpdate\LinksUpdate->doUpdate()
#11 /srv/mediawiki/php-1.42.0-wmf.7/includes/deferred/RefreshSecondaryDataUpdate.php(103): MediaWiki\Deferred\DeferredUpdates::attemptUpdate(MediaWiki\Deferred\LinksUpdate\LinksUpdate)
#12 /srv/mediawiki/php-1.42.0-wmf.7/includes/deferred/DeferredUpdates.php(469): RefreshSecondaryDataUpdate->doUpdate()
#13 /srv/mediawiki/php-1.42.0-wmf.7/includes/Storage/DerivedPageDataUpdater.php(1837): MediaWiki\Deferred\DeferredUpdates::attemptUpdate(RefreshSecondaryDataUpdate)
#14 /srv/mediawiki/php-1.42.0-wmf.7/includes/page/WikiPage.php(2113): MediaWiki\Storage\DerivedPageDataUpdater->doSecondaryDataUpdates(array)
#15 /srv/mediawiki/php-1.42.0-wmf.7/includes/jobqueue/jobs/RefreshLinksJob.php(244): WikiPage->doSecondaryDataUpdates(array)
#16 /srv/mediawiki/php-1.42.0-wmf.7/includes/jobqueue/jobs/RefreshLinksJob.php(162): RefreshLinksJob->runForTitle(MediaWiki\Title\Title)
#17 /srv/mediawiki/php-1.42.0-wmf.7/extensions/EventBus/includes/JobExecutor.php(80): RefreshLinksJob->run()
#18 /srv/mediawiki/rpc/RunSingleJob.php(77): MediaWiki\Extension\EventBus\JobExecutor->execute(array)
#19 {main}
Impact
Notes

From what I've seen on the help desk (1 2 3 since my last comment here), VPT, and TEA (1 2), there's also a mentions of a Wikimedia\Rdbms\DBTransactionSizeError happening intermittently. The issue seems to appear on specific articles, with (as of now) no discernible pattern in what articles are impacted. It also just clears itself after some time, but there's been reports of this as recent as 10 minutes ago.

TheresNoTime triaged this task as Unbreak Now! priority.EditedDec 4 2023, 1:43 PM

[57fe5eb8-9bfc-4b3c-83ed-d9de7c06b5a6] 2023-12-04 13:38:13: Fatal exception of type "Wikimedia\Rdbms\DBQueryError" on https://en.wikipedia.org/wiki/TVB_(disambiguation) when attempting to revert or save an edit, same

{"type":"https://mediawiki.org/wiki/HyperSwitch/errors/not_found","title":"Not found.","method":"get","uri":"/en.wikipedia.org/v1/page/html/TVB_(disambiguation)/1188292473","internalURI":"https://restbase-async.discovery.wmnet:7443/en.wikipedia.org/v1/page/html/TVB_(disambiguation)/1188292473","internalMethod":"get"}

error.

Going to bump this to UBN! as it appears to be affecting more editors.


Current repro:

OR

I don't know how restbase or hyperswitch ended up in critical path of saving edits, that is a rather important issue we need to check.

The general cause of DBTransactionSizeError is something like this:

  • A write to db is being triggered way too early in the request, opening a write transaction on master
  • MediaWiki starts doing other things and keeping it open.
  • It times out and causes rollback of everything.

I've seen this before but can't say with 100% certainty this is the case here without properly looking.

@Ladsgroup I think the log linked by @TheresNoTime is a typical example of a distributed transaction going wrong:

  • We start the db transaction for the edit
  • We send out the event to eventgate about this new revision
  • Something happens that causes the transaction to rollback for $reasons
  • changeprop will catch the event and try to pre-generate the cache in restbase

So I don't think the restbase error is causing the problem, but rather a consequence :)

Yup, after looking at logs properly, it's clear.

I have to go to a meeting, if someone is willing to reproduce the issue in mwdebug while verbose log (there is an option for it in x-debug) is enabled and look at the queries, it should be fixable really quickly

I've gotten this error twice, when trying to make the same simple edit to a page

A database query error has occurred. This may indicate a bug in the software.

[9d5d7fdb-de46-4a8e-af0f-44b92734e276] 2023-12-04 14:27:43: Fatal exception of type "Wikimedia\Rdbms\DBQueryError"

Got a verbose log for [e7bc3819-b052-43a3-a9e2-438ae9d4b38f] 2023-12-04 15:01:09: Fatal exception of type "Wikimedia\Rdbms\DBQueryError", on article https://en.wikipedia.org/wiki/CDDAhttps://logstash.wikimedia.org/goto/68be5487036807f7bb23ecfef5dd9bf5

I got 3 different error messages multiple times today while editing:

  • "Server returned error: HTTP 500."
  • "[XXXX-XXX-XXX-XXX-XXX] Caught exception of type Wikimedia\Rdbms\DBQueryError"
  • "Fatal exception of type "Wikimedia\Rdbms\DBQueryError"

If I refresh the page to try to publish my edits again, one message may appear instead of another (e.g., I click "Public changes" then error 500 appears, I click "Try again" and then "Caught exception" would appear. And vice versa) so I assume all these errors are somehow related.

It happened again

[80340636-5581-4e19-a4ce-a0a6b2a7215e] 2023-12-04 15:23:08: Fatal exception of type "Wikimedia\Rdbms\DBQueryError"

At least for TVB, I can't reproduce it anymore: https://en.wikipedia.org/w/index.php?title=TVB_(disambiguation)&action=history Can someone give me a reproducible error to look at?

	Expectation (writeQueryTime <= 1) by MediaWiki::main not met (actual: 7.6661319732666) in trx #1701ce9c66:
role-primary: SELECT page_latest FROM `page` WHERE page_id = N AND page_namespace = N AND page_title = 'X' LIMIT N FOR UPDATE

Yup, the exact thing I was suspecting, now trying to find what is the doing the first write query.

This is also happening on https://en.wikipedia.org/wiki/CDDA with the following error:

[0458d586-c21c-4c1b-bc95-35edbaabe49d] 2023-12-04 15:33:32: Fatal exception of type "Wikimedia\Rdbms\DBQueryError"

Hi, We get a log error for each one of these, we see them and I'm investigating. No need to paste them here anymore. Thanks!

The errors increased sharply around 6:30 UTC today: (searching for exception.class Wikimedia\Rdbms\DBTransactionSizeError)

https://logstash.wikimedia.org/goto/6ed84bb6b1c78f2f448265f540a689d7

image.png (528×2 px, 84 KB)

I don't see anything suspicious logged in https://wikitech.wikimedia.org/wiki/Server_Admin_Log#2023-12-04 at the time…

Change 979996 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[mediawiki/core@master] Category: Stop locking thousands of rows

https://gerrit.wikimedia.org/r/979996

Change 979692 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[mediawiki/core@wmf/1.42.0-wmf.7] Category: Stop locking thousands of rows

https://gerrit.wikimedia.org/r/979692

Change 980010 had a related patch set uploaded (by Lucas Werkmeister (WMDE); author: Lucas Werkmeister (WMDE)):

[mediawiki/core@master] Category: Update comments

https://gerrit.wikimedia.org/r/980010

One thing to also fix here is that things like SELECT FOR UPDATE, SELECT GET_LOCK()...any SELECT really...should be exempted from the transaction size check in approvePrimaryChanges(). There is no use in ROLLBACK at that point. I'll make a patch for that.

This comment was removed by aaron.

Change 979996 merged by jenkins-bot:

[mediawiki/core@master] Category: Stop locking thousands of rows

https://gerrit.wikimedia.org/r/979996

Change 979692 merged by jenkins-bot:

[mediawiki/core@wmf/1.42.0-wmf.7] Category: Stop locking thousands of rows

https://gerrit.wikimedia.org/r/979692

Mentioned in SAL (#wikimedia-operations) [2023-12-04T17:33:18Z] <ladsgroup@deploy2002> Started scap: Backport for [[gerrit:979692|Category: Stop locking thousands of rows (T352628)]]

Mentioned in SAL (#wikimedia-operations) [2023-12-04T17:34:48Z] <ladsgroup@deploy2002> ladsgroup: Backport for [[gerrit:979692|Category: Stop locking thousands of rows (T352628)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)

Mentioned in SAL (#wikimedia-operations) [2023-12-04T17:41:25Z] <ladsgroup@deploy2002> Finished scap: Backport for [[gerrit:979692|Category: Stop locking thousands of rows (T352628)]] (duration: 08m 07s)

Ladsgroup edited projects, added DBA; removed Patch-For-Review.
Ladsgroup moved this task from Triage to Done on the DBA board.
Ladsgroup removed a project: SRE.

Change 980010 merged by jenkins-bot:

[mediawiki/core@master] Category: Update comments

https://gerrit.wikimedia.org/r/980010

Change 980026 had a related patch set uploaded (by Aaron Schulz; author: Aaron Schulz):

[mediawiki/core@master] rdbms: do not count SELECT queries toward max transaction write duration

https://gerrit.wikimedia.org/r/980026