Page MenuHomePhabricator

Unable to create redirect on dewiki - fatal DBQueryError
Closed, DuplicatePublicPRODUCTION ERROR

Description

2019-04-08.png (380×1 px, 49 KB)

Hi,

by creating a redirect at german-spoken Wikipedia get this error. For more times:

  • [XKsNEgpAICwAAKQZzWAAAACU] 2019-04-08 08:58:10: Fataler Ausnahmefehler des Typs „Wikimedia\Rdbms\DBQueryError“
  • [XKsNbgpAICkAAFIiNiYAAACW] 2019-04-08 08:59:42: Fataler Ausnahmefehler des Typs „Wikimedia\Rdbms\DBQueryError“

Event Timeline

message
A database query error has occurred. Did you forget to run your application's database schema updater after upgrading? 
Query: INSERT INTO `redirect` (rd_from,rd_namespace,rd_title,rd_fragment,rd_interwiki) VALUES ([REDACTED],'0',[REDACTED],[REDACTED],[REDACTED]) ON DUPLICATE KEY UPDATE rd_namespace = '0',rd_title = [REDACTED],rd_fragment = '',rd_interwiki = ''
Function: WikiPage::insertRedirectEntry
Error: 1205 Lock wait timeout exceeded; try restarting transaction
trace
#0 /srv/mediawiki/php-1.33.0-wmf.23/includes/libs/rdbms/database/Database.php(1530): Wikimedia\Rdbms\Database->getQueryExceptionAndLog(string, integer, string, string)
#1 /srv/mediawiki/php-1.33.0-wmf.23/includes/libs/rdbms/database/Database.php(1248): Wikimedia\Rdbms\Database->reportQueryError(string, integer, string, string, boolean)
#2 /srv/mediawiki/php-1.33.0-wmf.23/includes/libs/rdbms/database/DatabaseMysqlBase.php(1347): Wikimedia\Rdbms\Database->query(string, string)
#3 /srv/mediawiki/php-1.33.0-wmf.23/includes/page/WikiPage.php(1091): Wikimedia\Rdbms\DatabaseMysqlBase->upsert(string, array, array, array, string)
#4 /srv/mediawiki/php-1.33.0-wmf.23/includes/page/WikiPage.php(1464): WikiPage->insertRedirectEntry(Title)
#5 /srv/mediawiki/php-1.33.0-wmf.23/includes/page/WikiPage.php(1423): WikiPage->updateRedirectOn(Wikimedia\Rdbms\DBConnRef, Title, NULL)
#6 /srv/mediawiki/php-1.33.0-wmf.23/includes/Storage/PageUpdater.php(1125): WikiPage->updateRevisionOn(Wikimedia\Rdbms\DBConnRef, Revision, integer)
#7 /srv/mediawiki/php-1.33.0-wmf.23/includes/Storage/PageUpdater.php(758): MediaWiki\Storage\PageUpdater->doCreate(CommentStoreComment, User, integer)
#8 /srv/mediawiki/php-1.33.0-wmf.23/includes/page/WikiPage.php(1916): MediaWiki\Storage\PageUpdater->saveRevision(CommentStoreComment, integer)
#9 /srv/mediawiki/php-1.33.0-wmf.23/includes/EditPage.php(2304): WikiPage->doEditContent(WikitextContent, CommentStoreComment, integer, boolean, User, string, array, integer)
#10 /srv/mediawiki/php-1.33.0-wmf.23/includes/EditPage.php(1595): EditPage->internalAttemptSave(array, boolean)
#11 /srv/mediawiki/php-1.33.0-wmf.23/includes/EditPage.php(694): EditPage->attemptSave(array)
#12 /srv/mediawiki/php-1.33.0-wmf.23/includes/actions/EditAction.php(60): EditPage->edit()
#13 /srv/mediawiki/php-1.33.0-wmf.23/includes/actions/SubmitAction.php(38): EditAction->show()
#14 /srv/mediawiki/php-1.33.0-wmf.23/includes/MediaWiki.php(501): SubmitAction->show()
#15 /srv/mediawiki/php-1.33.0-wmf.23/includes/MediaWiki.php(294): MediaWiki->performAction(Article, Title)
#16 /srv/mediawiki/php-1.33.0-wmf.23/includes/MediaWiki.php(867): MediaWiki->performRequest()
#17 /srv/mediawiki/php-1.33.0-wmf.23/includes/MediaWiki.php(517): MediaWiki->main()
#18 /srv/mediawiki/php-1.33.0-wmf.23/index.php(42): MediaWiki->run()
#19 /srv/mediawiki/w/index.php(3): include(string)
#20 {main}

(Not really private data in there, but redacted just in case)

Hi,

I'm a user at german-spoken Wikipedia. This error get by creating a redirect. About the technical background o.O ;)

Repro:

[XKsV7QpAIDAAAJ8nsi8AAAAW] 2019-04-08 09:35:57: Fataler Ausnahmefehler des Typs „Wikimedia\Rdbms\DBQueryError“

Update:
If using function Preview (Vorschau) after writing #redirect and than adding a category no error message.

  1. Update:

The error isn't outgoing. He exists furthermore.

Krinkle renamed this task from database error by creating a redirect to Unable to create redirect on dewiki - fatal DBQueryError.Apr 8 2019, 6:11 PM

There have been previous reports about problems creating redirects, but those were about specific edge cases. See also T120140 and T163597.

I've looked into this report from @Markus_Schulenburg and confirmed it is about very plain creation of a wikipage as a redirect.

For example:

Error
  • Request ID: XKsV7QpAIDAAAJ8nsi8AAAAW

This HTTP-POST request submits the edit form after opening a red link in the main namespace, e.g. https://de.wikipedia.org/w/index.php?title=[Some_title_here]&action=edit&redlink=1. The form destination being /w/index.php?title=[Some_title_here]&action=submit.

message
[XKsV7QpAIDAAAJ8nsi8AAAAW] /w/index.php?title=####&action=submit   Wikimedia\Rdbms\DBQueryError  …
Query: INSERT INTO `redirect` (rd_from,rd_namespace,rd_title,rd_fragment,rd_interwiki) VALUES ('#######','0','##################','','') ON DUPLICATE KEY UPDATE rd_namespace = '0',rd_title = '##################',rd_fragment = '',rd_interwiki = ''
Function: WikiPage::insertRedirectEntry
Error: 1205 Lock wait timeout exceeded; try restarting transaction
trace
#3 /srv/mediawiki/php-1.33.0-wmf.23/includes/page/WikiPage.php(1091): Wikimedia\Rdbms\DatabaseMysqlBase->upsert(string, array, array, array, string)
#4 /srv/mediawiki/php-1.33.0-wmf.23/includes/page/WikiPage.php(1464): WikiPage->insertRedirectEntry(Title)
#5 /srv/mediawiki/php-1.33.0-wmf.23/includes/page/WikiPage.php(1423): WikiPage->updateRedirectOn(Wikimedia\Rdbms\DBConnRef, Title, NULL)
#6 /srv/mediawiki/php-1.33.0-wmf.23/includes/Storage/PageUpdater.php(1125): WikiPage->updateRevisionOn(Wikimedia\Rdbms\DBConnRef, Revision, integer)
#7 /srv/mediawiki/php-1.33.0-wmf.23/includes/Storage/PageUpdater.php(758): MediaWiki\Storage\PageUpdater->doCreate(CommentStoreComment, User, integer)
#8 /srv/mediawiki/php-1.33.0-wmf.23/includes/page/WikiPage.php(1916): MediaWiki\Storage\PageUpdater->saveRevision(CommentStoreComment, integer)
#9 /srv/mediawiki/php-1.33.0-wmf.23/includes/EditPage.php(2304): WikiPage->doEditContent(WikitextContent, CommentStoreComment, integer, boolean, User, string, array, integer)
#10 /srv/mediawiki/php-1.33.0-wmf.23/includes/EditPage.php(1595): EditPage->internalAttemptSave(array, boolean)
#11 /srv/mediawiki/php-1.33.0-wmf.23/includes/EditPage.php(694): EditPage->attemptSave(array)
#12 /srv/mediawiki/php-1.33.0-wmf.23/includes/actions/EditAction.php(60): EditPage->edit()
#13 /srv/mediawiki/php-1.33.0-wmf.23/includes/actions/SubmitAction.php(38): EditAction->show()
#14 /srv/mediawiki/php-1.33.0-wmf.23/includes/MediaWiki.php(501): SubmitAction->show()
#15 /srv/mediawiki/php-1.33.0-wmf.23/includes/MediaWiki.php(294): MediaWiki->performAction(Article, Title)
#16 /srv/mediawiki/php-1.33.0-wmf.23/includes/MediaWiki.php(867): MediaWiki->performRequest()
#17 /srv/mediawiki/php-1.33.0-wmf.23/includes/MediaWiki.php(517): MediaWiki->main()
#18 /srv/mediawiki/php-1.33.0-wmf.23/index.php(42): MediaWiki->run()
Notes

Might be a recent regression, or a shift in user activity.

Screenshot 2019-04-08 at 19.16.45.png (518×2 px, 93 KB)

CC-ing DBA: Do we know of a change in schema, indexes or query traffic that could explain e.g. why lock contention would be significantly worse?

CC-ing CPT: Any recent changes around PageUpdater->saveRevision or WikiPage->updateRedirectOn that might explain the issue?

So this doesn't really like something "new" right?
I mean, looks like it has been happening before, on a lower scale, but still happening:

screenshot-logstash.wikimedia.org-2019.04.08-21-53-05.png (176×1 px, 19 KB)

So this doesn't really like something "new" right?
I mean, looks like it has been happening before, on a lower scale, but still happening:
F28596096

Indeed. There have been reports to Logstash of this error message before. But there is more than one way of causing that error message.

As I understand it, the previous reports were from edge cases like T120140/T163597, which are about "redirects during page move" and "redirects with self-referential template".

The new reports are about the common use case of creating a redirect plainly for Wikipedia article names. At least from users, this is the first such report I could find.

Going to untag DBA as the infrastructure is working fine from what we can see. Will remain subscribed in case there are questions or requests for us.

daniel subscribed.

Unassigning myself, since I don't know what do to about this. I'm probably not the best person to investigate this issue, either.

How frequent is this? Should it be high priority?

More lock contention around this area, see also T223175.

@Krinkle Do you have a sense of what the level of priority for this task should be? Setting it to normal for now but please correct.

mmodell changed the subtype of this task from "Task" to "Production Error".Aug 28 2019, 11:07 PM

@WDoranWMF I'm not entirely sure. Right now it means there are a few cases a week where a user tries to create a redirect article and upon saving has their edit take a number of seconds after which they see a non-descriptive "Internal error" page mentioning "DBQueryError".

Checked to see if this was still occurring, and last logged instance was 6 occurrences between 2020-04-03T10:35:35 and 2020-04-03T10:37:36

Thanks, that's good to know.

I don't mean to sound impatient (it might be fine for this to have a low priority). But this is currently also marked as "Stalled", which usually means you're (also) blocked on someone outside the team doing something and/or providing more information. Is there something the original reporter or me can do to help move this forward? (See bug lifecycle and bug mgtm.)

@Krinkle I guess the problem is that nobody has a clue why it occurs, how to reproduce it, or how to debug it. Ideally, when a lock timeout occurs, we'd have a way to somehow capture the stack trace of the process that is holding the lock we are waiting for. I have no idea how that could be achieved, but I also don#t know how to investigate this without this ability. We can poke around the code and guess at what might be holding the lock, I suppose...

As to poking around the code and speculating:

  • The failure happens in an upsert in WikiPage::insertRedirectEntry, called from updateRedirectOn, which is called from updateRevisionOn, called from PageUpdated::doCreate. doCreate() starts an atomic section, which afaik will start a transaction, so any locks acquired while doCreate() will block any other things trying to acquire locks on the same rows.
  • I note that when insertRedirectEntry() is called from insertRedirect(), it's running in a deferred update. Perhaps updateRedirectOn() could defer to call to insertRedirectEntry() until after the current transaction, or even the end of the request?
  • I note that this happens during creation, so no other process can be holding a lock on the specific row in the redirect table, since the page didn't previously exist. This would indicate that we are waiting for a gap lock. Perhaps we are hitting something similar to T199393 here? Can this be triggered by rapidly creating multiple redirects with similar names? Or to the same page?

I don't have any new ideas, T220353#6045847 is a good summary.

  • I note that this happens during creation, so no other process can be holding a lock on the specific row in the redirect table, since the page didn't previously exist. This would indicate that we are waiting for a gap lock. Perhaps we are hitting something similar to T199393 here? Can this be triggered by rapidly creating multiple redirects with similar names? Or to the same page?

Note that MariaDB doesn't seem to create gap locks for the actual INSERT, just for other write operations (including SELECT with FOR UPDATE or LOCK IN SHARE MODE) that would try to touch missing rows. Note "would try to touch missing rows" includes an attempt to delete an rd_from that already doesn't exist, but does not include a delete of an rd_from that does exist.

There's no unique index on anything in redirect other than rd_from, so any potential gap locking would have to depend on the page's IDs rather than titles.

Note that MariaDB doesn't seem to create gap locks for the actual INSERT, just for other write operations (including SELECT with FOR UPDATE or LOCK IN SHARE MODE) that would try to touch missing rows. Note "would try to touch missing rows" includes an attempt to delete an rd_from that already doesn't exist, but does not include a delete of an rd_from that does exist.

The way Database::upsert is implemented is by attempting an update first, which would in this case fail because the row doesn't exist. If the update fails, an insert is performed. Would that update case a gap lock? Could it be helpful to replace the upsert() with replace()? That would first attempt to delete. That would probably make it worse....

Putting this into the Platform Engineering inbox for reassessment. This is a production error that we shouldn't just ignore permanently, even if fixing it isn't super urgent.

Aklapper changed the task status from Stalled to Open.Oct 19 2020, 4:33 PM

The previous comments don't explain who or what (task?) exactly this task is stalled on ("If a report is waiting for further input (e.g. from its reporter or a third party) and can currently not be acted on"). Hence resetting task status.

(Smallprint, as general orientation for task management: If you wanted to express that nobody is currently working on this task, then the assignee should be removed and/or priority could be lowered instead. If work on this task is blocked by another task, then that other task should be added via Edit Related Tasks...Edit Subtasks. If this task is stalled on an upstream project, then the Upstream tag should be added. If this task requires info from the task reporter, then there should be instructions which info is needed. If this task needs retesting, then the TestMe tag should be added. If this task is either out of scope and nobody should ever work on this, or nobody else managed to reproduce the problem described in this task, then this task should have the "Declined" status. If the task is valid but should not appear on some team's workboard, then the team project tag should be removed while the task has another active project tag.)