Page MenuHomePhabricator

Multiple page or file undelete/restore requests around the same time can deadlock (Fatal DBQueryError)
Open, Needs TriagePublic

Description

(I'm not smart enough to know for a fact that this hasn't already been reported; there are a number of similar tickets, but those that are not closed appear to be somewhat different.)

Basically, attempting a bunch of action=undelete queries can result in multiple internal_api_error_DBQueryError errors, reported as Wikimedia\Rdbms\DBQueryError and a varying string like XMrEMQpAMDwAAGzj1h4AAAAW or XMtbogpAAEwAAA7kBZ0AAACO.

Sometimes it works fine, and sometimes 20%-80% of the queries fail, even after attempting them all twice. Has been going on for at least a few months, possibly much longer. This is using the batchundelete module from Twinkle on enwiki (or testwiki). The same code is used for batchdelete and doesn't result in the same errors.

2019-05-02 10:19:29 [XMrEMQpAMDwAAGzj1h4AAAAW] mw1225 enwiki 1.34.0-wmf.1 exception ERROR: [XMrEMQpAMDwAAGzj1h4AAAAW] /w/api.php   Wikimedia\Rdbms\DBQueryError from line 1587 of /srv/mediawiki/php-1.34.0-wmf.1/includes/libs/rdbms/database/Database.php: A database query error has occurred. Did you forget to run your application's database schema updater after upgrading? 
Query: INSERT IGNORE INTO `page` (page_namespace,page_title,page_restrictions,page_is_redirect,page_is_new,page_random,page_touched,page_latest,page_len,page_id) VALUES ('2','Amorymeltzer/sandbox/5','','0','1','0.206380669357','20190502101929','0','0','60635647')
Function: WikiPage::insertOn
Error: 1213 Deadlock found when trying to get lock; try restarting transaction (10.64.32.64)
 {"exception_id":"XMrEMQpAMDwAAGzj1h4AAAAW","exception_url":"/w/api.php","caught_by":"mwe_handler"} 
[Exception Wikimedia\Rdbms\DBQueryError] (/srv/mediawiki/php-1.34.0-wmf.1/includes/libs/rdbms/database/Database.php:1587) A database query error has occurred. Did you forget to run your application's database schema updater after upgrading? 
Query: INSERT IGNORE INTO `page` (page_namespace,page_title,page_restrictions,page_is_redirect,page_is_new,page_random,page_touched,page_latest,page_len,page_id) VALUES ('2','Amorymeltzer/sandbox/5','','0','1','0.206380669357','20190502101929','0','0','60635647')
Function: WikiPage::insertOn
Error: 1213 Deadlock found when trying to get lock; try restarting transaction (10.64.32.64)

  #0 /srv/mediawiki/php-1.34.0-wmf.1/includes/libs/rdbms/database/Database.php(1556): Wikimedia\Rdbms\Database->getQueryExceptionAndLog(string, integer, string, string)
  #1 /srv/mediawiki/php-1.34.0-wmf.1/includes/libs/rdbms/database/Database.php(1274): Wikimedia\Rdbms\Database->reportQueryError(string, integer, string, string, boolean)
  #2 /srv/mediawiki/php-1.34.0-wmf.1/includes/libs/rdbms/database/Database.php(2149): Wikimedia\Rdbms\Database->query(string, string)
  #3 /srv/mediawiki/php-1.34.0-wmf.1/includes/page/WikiPage.php(1345): Wikimedia\Rdbms\Database->insert(string, array, string, string)
  #4 /srv/mediawiki/php-1.34.0-wmf.1/includes/page/PageArchive.php(735): WikiPage->insertOn(Wikimedia\Rdbms\DatabaseMysqli, string)
  #5 /srv/mediawiki/php-1.34.0-wmf.1/includes/page/PageArchive.php(506): PageArchive->undeleteRevisions(array, boolean, string)
  #6 /srv/mediawiki/php-1.34.0-wmf.1/includes/api/ApiUndelete.php(74): PageArchive->undelete(array, string, NULL, boolean, User, NULL)
  #7 /srv/mediawiki/php-1.34.0-wmf.1/includes/api/ApiMain.php(1593): ApiUndelete->execute()
  #8 /srv/mediawiki/php-1.34.0-wmf.1/includes/api/ApiMain.php(531): ApiMain->executeAction()
  #9 /srv/mediawiki/php-1.34.0-wmf.1/includes/api/ApiMain.php(502): ApiMain->executeActionWithErrorHandling()
  #10 /srv/mediawiki/php-1.34.0-wmf.1/api.php(87): ApiMain->execute()
  #11 /srv/mediawiki/w/api.php(3): require(string)
  #12 {main}

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMay 2 2019, 9:08 PM
Reedy updated the task description. (Show Details)
Anomie moved this task from Unsorted to Non-core-API stuff on the MediaWiki-API board.
Anomie added a subscriber: Anomie.

I don't think this is likely to be an issue in the API itself, but instead in the deletion code used by the API (and the normal web deletion action as well). Although my best guess is that it's going to get into tricky details of MySQL/MariaDB gap locking, which I can't say I'm very familiar with.

If Twinkle is sending its action=delete requests in parallel, you might want to see if sending them serially (i.e. waiting for each one to complete before sending the next) avoids the issue. If it's already working serially, adding a short delay between retries might help.

@Anomie It's done in parallel, but the issue here arises specifically with undelete, not with delete, even though both are handled in the same way. That's what raised my eyebrows, since I wouldn't expect delete and undelete to behave/react differently.

Anomie added a comment.May 8 2019, 3:31 PM

Err, yeah, I meant "action=undelete requests".

As I said, I suspect it's MariaDB gap locking when reinserting the page rows. There isn't an actual row with page_id=60635647, so it locks the "gap" between the previous and next rows that do exist when doing a SELECT FOR UPDATE or an INSERT. Multiple SELECT gap-locks can exist on the same gap without conflicting, so if two processes do that on the same gap neither will block the other. But once they get to the INSERT, process A's gap lock will block process B and process B's gap lock will block process A, which is the deadlock.

@Anomie Thanks for that explanation. If I understand correctly, this issue means that in general we cannot reliably let multiple admins on the same wiki use the "page restoration" and "file undelete" features (e.g. the issue is not specific to a single user performing actions in multiple tabs).

E.g. on Commons when admins go through a backlog of requests, it's not uncommon to handle a few at the same time, in quick succession, or in collaboration with other admins. (<= Keywords for Phab search)

Krinkle renamed this task from Bulk API undeletion results in database query error to Multiple page or file undelete/restore requests around the same time can deadlock (Fatal DBQueryError).Thu, May 30, 12:20 AM