Page MenuHomePhabricator

Deadlock found when trying to get lock (UserOptionsManager::saveOptionsQuery)
Closed, ResolvedPublicPRODUCTION ERROR

Description

Error
normalized_message
[{reqId}] {exception_url}   Wikimedia\Rdbms\DBQueryError: Error 1213: Deadlock found when trying to get lock; try restarting transaction (db2112)
Function: MediaWiki\User\UserOptionsManager::saveOptions
Query: INSERT IGNORE INTO `user_properties` (up_user,up_property,up_value) VALUES (XX,'VectorSkinVersion','1'),(XX,'rcenhancedfilters-seen-tour',1),(XX,'wlenhancedfilters-seen-tour',1)
exception.trace
from /srv/mediawiki/php-1.37.0-wmf.12/includes/libs/rdbms/database/Database.php(1760)
#0 /srv/mediawiki/php-1.37.0-wmf.12/includes/libs/rdbms/database/Database.php(1744): Wikimedia\Rdbms\Database->getQueryException(string, integer, string, string)
#1 /srv/mediawiki/php-1.37.0-wmf.12/includes/libs/rdbms/database/Database.php(1719): Wikimedia\Rdbms\Database->getQueryExceptionAndLog(string, integer, string, string)
#2 /srv/mediawiki/php-1.37.0-wmf.12/includes/libs/rdbms/database/Database.php(1283): Wikimedia\Rdbms\Database->reportQueryError(string, integer, string, string, boolean)
#3 /srv/mediawiki/php-1.37.0-wmf.12/includes/libs/rdbms/database/Database.php(2424): Wikimedia\Rdbms\Database->query(string, string, integer)
#4 /srv/mediawiki/php-1.37.0-wmf.12/includes/libs/rdbms/database/Database.php(2384): Wikimedia\Rdbms\Database->doInsertNonConflicting(string, array, string)
#5 /srv/mediawiki/php-1.37.0-wmf.12/includes/libs/rdbms/database/DBConnRef.php(68): Wikimedia\Rdbms\Database->insert(string, array, string, array)
#6 /srv/mediawiki/php-1.37.0-wmf.12/includes/libs/rdbms/database/DBConnRef.php(373): Wikimedia\Rdbms\DBConnRef->__call(string, array)
#7 /srv/mediawiki/php-1.37.0-wmf.12/includes/user/UserOptionsManager.php(450): Wikimedia\Rdbms\DBConnRef->insert(string, array, string, array)
#8 /srv/mediawiki/php-1.37.0-wmf.12/extensions/PageTriage/includes/Hooks.php(841): MediaWiki\User\UserOptionsManager->saveOptions(User)
#9 /srv/mediawiki/php-1.37.0-wmf.12/includes/HookContainer/HookContainer.php(338): MediaWiki\Extension\PageTriage\Hooks::onLocalUserCreated(User, boolean)
#10 /srv/mediawiki/php-1.37.0-wmf.12/includes/HookContainer/HookContainer.php(137): MediaWiki\HookContainer\HookContainer->callLegacyHook(string, array, array, array)
#11 /srv/mediawiki/php-1.37.0-wmf.12/includes/HookContainer/HookRunner.php(2389): MediaWiki\HookContainer\HookContainer->run(string, array)
#12 /srv/mediawiki/php-1.37.0-wmf.12/includes/auth/AuthManager.php(1452): MediaWiki\HookContainer\HookRunner->onLocalUserCreated(User, boolean)
#13 /srv/mediawiki/php-1.37.0-wmf.12/includes/auth/AuthManager.php(1169): MediaWiki\Auth\AuthManager->continueAccountCreation(array)
#14 /srv/mediawiki/php-1.37.0-wmf.12/includes/specialpage/AuthManagerSpecialPage.php(377): MediaWiki\Auth\AuthManager->beginAccountCreation(User, array, string)
#15 /srv/mediawiki/php-1.37.0-wmf.12/includes/specialpage/AuthManagerSpecialPage.php(502): AuthManagerSpecialPage->performAuthenticationStep(string, array)
#16 /srv/mediawiki/php-1.37.0-wmf.12/includes/htmlform/HTMLForm.php(718): AuthManagerSpecialPage->handleFormSubmit(array, VFormHTMLForm)
#17 /srv/mediawiki/php-1.37.0-wmf.12/includes/specialpage/AuthManagerSpecialPage.php(435): HTMLForm->trySubmit()
#18 /srv/mediawiki/php-1.37.0-wmf.12/includes/specialpage/LoginSignupSpecialPage.php(318): AuthManagerSpecialPage->trySubmit()
#19 /srv/mediawiki/php-1.37.0-wmf.12/includes/specialpage/SpecialPage.php(646): LoginSignupSpecialPage->execute(NULL)
#20 /srv/mediawiki/php-1.37.0-wmf.12/includes/specialpage/SpecialPageFactory.php(1362): SpecialPage->run(NULL)
#21 /srv/mediawiki/php-1.37.0-wmf.12/includes/MediaWiki.php(314): MediaWiki\SpecialPage\SpecialPageFactory->executePath(string, RequestContext)
#22 /srv/mediawiki/php-1.37.0-wmf.12/includes/MediaWiki.php(917): MediaWiki->performRequest()
#23 /srv/mediawiki/php-1.37.0-wmf.12/includes/MediaWiki.php(551): MediaWiki->main()
#24 /srv/mediawiki/php-1.37.0-wmf.12/index.php(53): MediaWiki->run()
#25 /srv/mediawiki/php-1.37.0-wmf.12/index.php(46): wfIndexMain()
#26 /srv/mediawiki/w/index.php(3): require(string)
#27 {main}
Impact

A portion of account sign-ups are cancelled due to the CreateAccount request fatalling on this database error.

Notes

Query message:("Deadlock found when trying to get" AND "user_properties") on Logstash mediawiki-errors:

Screenshot 2021-07-12 at 22.45.54.png (324×1 px, 51 KB)

/cc @tstarling @Pchelolo This might be fallout from recent refactoring in UserOptionsManager.

I notice that a majority of these appear to be from setting VectorSkinVersion=1 during account creation. That seems harmless but since it's new-ish, mentioning it here as that might somehow be inducing the regression if it this happens in an unsafe way.

It started Wed 30 June 2021, 19:05 UTC, which aligns perfectly with the SAL with:

  • @brennen Synchronized php: group1 wikis to 1.37.0-wmf.12

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
  1. use IMaintainableDatabase::deadlockLoop() to retry the transaction. Although this would require wrapping the update with onTransactionCommitOrIdle().
  2. Fix PageTriage, which shouldn't be bloating the user_properties table by unconditionally inserting rows on user creation. Per my comment at T54777#587031, we could have a default that depends on user registration date. Or we could apply the optimisation operator.

I think the main point of LOCK IN SHARE MODE is deadlock error generation. It should probably be renamed LOCK FOR DEADLOCK ERROR.

I think the main point of LOCK IN SHARE MODE is deadlock error generation. It should probably be renamed LOCK FOR DEADLOCK ERROR.

hehe :)

I think you were very right saying my previous attempt to fix it was incomplete and we just shouldn't lock at all. So I plan to do solution 2 - don't re-read the options.

The plan:

  1. Introduce a new hook instead of UserSaveOptions, we need to update it to UserIdentity anyway. New hook will only take modified options and original options, deleting the option will be done by setting it to null instead of unsetting it from a map of all current options.
  2. Update all extensions, deprecate old hook. Maybe even drop it without deprecation since it would be hard to support the old contract.
  3. Now in saveOptions we won't need to have all the options, we will only need modified options. Save those without any locking - @tstarling was totally right saying last update wins is acceptable here.

In the meantime the patch above replaces READ_LOCKING with READ_LATEST to fetch original options - it's a partial revert of my previous attempt to fix this, so even though its theoretically possible we can undo another preferences save operation, we've lived with this for a long time and it wasn't causing issues.

Change 704201 merged by jenkins-bot:

[mediawiki/core@master] Do not lock user_preferences before updating

https://gerrit.wikimedia.org/r/704201

Should this be backported to both wmf.12 and .14?

cc: @dancy as train primary this week.

Yeah, this is safe enough to backport.

Change 704181 had a related patch set uploaded (by Brennen Bearnes; author: Ppchelko):

[mediawiki/core@wmf/1.37.0-wmf.12] Do not lock user_preferences before updating

https://gerrit.wikimedia.org/r/704181

Change 704182 had a related patch set uploaded (by Brennen Bearnes; author: Ppchelko):

[mediawiki/core@wmf/1.37.0-wmf.14] Do not lock user_preferences before updating

https://gerrit.wikimedia.org/r/704182

Change 704182 merged by jenkins-bot:

[mediawiki/core@wmf/1.37.0-wmf.14] Do not lock user_preferences before updating

https://gerrit.wikimedia.org/r/704182

Change 704181 merged by jenkins-bot:

[mediawiki/core@wmf/1.37.0-wmf.12] Do not lock user_preferences before updating

https://gerrit.wikimedia.org/r/704181

Mentioned in SAL (#wikimedia-operations) [2021-07-13T17:05:01Z] <brennen@deploy1002> Synchronized php-1.37.0-wmf.12/includes/user/UserOptionsManager.php: Backport: [[gerrit:704181|Do not lock user_preferences before updating (T286521)]] (duration: 01m 58s)

Pchelolo lowered the priority of this task from Unbreak Now! to Medium.Jul 13 2021, 5:22 PM

After @brennen deployed the quick fix, deadlocks disappeared, so it's no longer a production error.

I still have some ideas on how to make this work better, will do it under Platform Team Workboards (MW Expedition) but lowering the priority to medium.

Frequency seems down, but @dancy just pointed out ~3 of these in the last hour. (Edit: For 1.37.0-wmf.12.)

grrrr... I guess the quick-fix has helped, I'll work on making the proper fix instead of trying to implement some more quick fixes.

Since this issue has been lingering since .12 I made it a blocker for the .14 train. .14 has been deployed to testwikis and is now blocked.

Change 704427 had a related patch set uploaded (by Ppchelko; author: Ppchelko):

[mediawiki/extensions/CentralAuth@master] Do not lock preferences row for a rememberpassword check

https://gerrit.wikimedia.org/r/704427

The patch above should deal with the rest of deadlocks.

Change 704427 merged by jenkins-bot:

[mediawiki/extensions/CentralAuth@master] Do not lock preferences row for a rememberpassword check

https://gerrit.wikimedia.org/r/704427

Change 704382 had a related patch set uploaded (by Ahmon Dancy; author: Ppchelko):

[mediawiki/extensions/CentralAuth@wmf/1.37.0-wmf.12] Do not lock preferences row for a rememberpassword check

https://gerrit.wikimedia.org/r/704382

Change 704382 merged by jenkins-bot:

[mediawiki/extensions/CentralAuth@wmf/1.37.0-wmf.12] Do not lock preferences row for a rememberpassword check

https://gerrit.wikimedia.org/r/704382

Mentioned in SAL (#wikimedia-operations) [2021-07-14T17:00:54Z] <dancy@deploy1002> Synchronized php-1.37.0-wmf.12/extensions/CentralAuth/includes/specials/SpecialCentralAutoLogin.php: Backport: [[gerrit:704382|Do not lock preferences row for a rememberpassword check (T286521)]] (duration: 01m 05s)

Change 704383 had a related patch set uploaded (by Ahmon Dancy; author: Ppchelko):

[mediawiki/extensions/CentralAuth@wmf/1.37.0-wmf.14] Do not lock preferences row for a rememberpassword check

https://gerrit.wikimedia.org/r/704383

Change 704383 merged by jenkins-bot:

[mediawiki/extensions/CentralAuth@wmf/1.37.0-wmf.14] Do not lock preferences row for a rememberpassword check

https://gerrit.wikimedia.org/r/704383

Mentioned in SAL (#wikimedia-operations) [2021-07-14T17:35:40Z] <dancy@deploy1002> Synchronized php-1.37.0-wmf.14/extensions/CentralAuth/includes/specials/SpecialCentralAutoLogin.php: Backport: [[gerrit:704383|Do not lock preferences row for a rememberpassword check (T286521)]] (duration: 01m 06s)

Looks like there are still a handful of these comming through at https://logstash.wikimedia.org/app/dashboards#/view/mediawiki-errors. Not sure if expected or not, but here goes:

normalized_message
[{reqId}] {exception_url}   Wikimedia\Rdbms\DBQueryError: Error 1213: Deadlock found when trying to get lock; try restarting transaction (db2112)
Function: MediaWiki\User\UserOptionsManager::saveOptions
Query: INSERT IGNORE INTO `user_properties` (up_user
exception.trace
from /srv/mediawiki/php-1.37.0-wmf.12/includes/libs/rdbms/database/Database.php(1760)
#0 /srv/mediawiki/php-1.37.0-wmf.12/includes/libs/rdbms/database/Database.php(1744): Wikimedia\Rdbms\Database->getQueryException(string, integer, string, string)
#1 /srv/mediawiki/php-1.37.0-wmf.12/includes/libs/rdbms/database/Database.php(1719): Wikimedia\Rdbms\Database->getQueryExceptionAndLog(string, integer, string, string)
#2 /srv/mediawiki/php-1.37.0-wmf.12/includes/libs/rdbms/database/Database.php(1283): Wikimedia\Rdbms\Database->reportQueryError(string, integer, string, string, boolean)
#3 /srv/mediawiki/php-1.37.0-wmf.12/includes/libs/rdbms/database/Database.php(2424): Wikimedia\Rdbms\Database->query(string, string, integer)
#4 /srv/mediawiki/php-1.37.0-wmf.12/includes/libs/rdbms/database/Database.php(2384): Wikimedia\Rdbms\Database->doInsertNonConflicting(string, array, string)
#5 /srv/mediawiki/php-1.37.0-wmf.12/includes/libs/rdbms/database/DBConnRef.php(68): Wikimedia\Rdbms\Database->insert(string, array, string, array)
#6 /srv/mediawiki/php-1.37.0-wmf.12/includes/libs/rdbms/database/DBConnRef.php(373): Wikimedia\Rdbms\DBConnRef->__call(string, array)
#7 /srv/mediawiki/php-1.37.0-wmf.12/includes/user/UserOptionsManager.php(450): Wikimedia\Rdbms\DBConnRef->insert(string, array, string, array)
#8 /srv/mediawiki/php-1.37.0-wmf.12/extensions/PageTriage/includes/Hooks.php(841): MediaWiki\User\UserOptionsManager->saveOptions(User)
#9 /srv/mediawiki/php-1.37.0-wmf.12/includes/HookContainer/HookContainer.php(338): MediaWiki\Extension\PageTriage\Hooks::onLocalUserCreated(User, boolean)
#10 /srv/mediawiki/php-1.37.0-wmf.12/includes/HookContainer/HookContainer.php(137): MediaWiki\HookContainer\HookContainer->callLegacyHook(string, array, array, array)
#11 /srv/mediawiki/php-1.37.0-wmf.12/includes/HookContainer/HookRunner.php(2389): MediaWiki\HookContainer\HookContainer->run(string, array)
#12 /srv/mediawiki/php-1.37.0-wmf.12/includes/auth/AuthManager.php(1452): MediaWiki\HookContainer\HookRunner->onLocalUserCreated(User, boolean)
#13 /srv/mediawiki/php-1.37.0-wmf.12/includes/auth/AuthManager.php(1169): MediaWiki\Auth\AuthManager->continueAccountCreation(array)
#14 /srv/mediawiki/php-1.37.0-wmf.12/includes/specialpage/AuthManagerSpecialPage.php(377): MediaWiki\Auth\AuthManager->beginAccountCreation(User, array, string)
#15 /srv/mediawiki/php-1.37.0-wmf.12/includes/specialpage/AuthManagerSpecialPage.php(502): AuthManagerSpecialPage->performAuthenticationStep(string, array)
#16 /srv/mediawiki/php-1.37.0-wmf.12/includes/htmlform/HTMLForm.php(718): AuthManagerSpecialPage->handleFormSubmit(array, VFormHTMLForm)
#17 /srv/mediawiki/php-1.37.0-wmf.12/includes/specialpage/AuthManagerSpecialPage.php(435): HTMLForm->trySubmit()
#18 /srv/mediawiki/php-1.37.0-wmf.12/includes/specialpage/LoginSignupSpecialPage.php(318): AuthManagerSpecialPage->trySubmit()
#19 /srv/mediawiki/php-1.37.0-wmf.12/includes/specialpage/SpecialPage.php(646): LoginSignupSpecialPage->execute(NULL)
#20 /srv/mediawiki/php-1.37.0-wmf.12/includes/specialpage/SpecialPageFactory.php(1362): SpecialPage->run(NULL)
#21 /srv/mediawiki/php-1.37.0-wmf.12/includes/MediaWiki.php(314): MediaWiki\SpecialPage\SpecialPageFactory->executePath(string, RequestContext)
#22 /srv/mediawiki/php-1.37.0-wmf.12/includes/MediaWiki.php(917): MediaWiki->performRequest()
#23 /srv/mediawiki/php-1.37.0-wmf.12/includes/MediaWiki.php(551): MediaWiki->main()
#24 /srv/mediawiki/php-1.37.0-wmf.12/index.php(53): MediaWiki->run()
#25 /srv/mediawiki/php-1.37.0-wmf.12/index.php(46): wfIndexMain()
#26 /srv/mediawiki/w/index.php(3): require(string)
#27 {main}

Note that the train is still blocked due to remaining deadlock errors.

Ladsgroup raised the priority of this task from Medium to Unbreak Now!.Jul 14 2021, 7:10 PM
Ladsgroup subscribed.

Train blocker = UBN. wmf.14 is not moving forward because of this.

Probably someone has more experience reading mysql deadlock info, so I'll leave it here while I'm figuring out what it all means.

LATEST DETECTED DEADLOCK
------------------------
2021-07-14 18:51:40 7f0a8c449700
*** (1) TRANSACTION:
TRANSACTION 84885484147, ACTIVE 1 sec inserting
mysql tables in use 1, locked 1
LOCK WAIT 10 lock struct(s), heap size 2936, 5 row lock(s), undo log entries 6
MySQL thread id 158831299, OS thread handle 0x7ee57547b700, query id 8318378384 10.192.48.96 wikiuser update
INSERT /* MediaWiki\User\UserOptionsManager::saveOptions  */ IGNORE INTO `user_properties` (up_user,up_property,up_value) VALUES (42174228,'popups','1'),(42174228,'VectorSkinVersion','1')
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 768 page no 33824 n bits 328 index `PRIMARY` of table `enwiki`.`user_properties` trx table locks 5 total table locks 2  trx id 84885484147 lock_mode X insert intention waiting lock hold time 1 wait time before grant 0 
*** (2) TRANSACTION:
TRANSACTION 84885483863, ACTIVE 2 sec inserting
mysql tables in use 1, locked 1
16 lock struct(s), heap size 2936, 20 row lock(s), undo log entries 18
MySQL thread id 158831223, OS thread handle 0x7f0a8c449700, query id 8318379260 10.192.16.57 wikiuser update
INSERT /* MediaWiki\User\UserOptionsManager::saveOptions  */ IGNORE INTO `user_properties` (up_user,up_property,up_value) VALUES (42174227,'rcenhancedfilters-seen-tour',1),(42174227,'wlenhancedfilters-seen-tour',1)
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 768 page no 33824 n bits 320 index `PRIMARY` of table `enwiki`.`user_properties` trx table locks 7 total table locks 2  trx id 84885483863 lock_mode X lock hold time 1 wait time before grant 0 
*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 768 page no 33824 n bits 328 index `PRIMARY` of table `enwiki`.`user_properties` trx table locks 7 total table locks 2  trx id 84885483863 lock_mode X insert intention waiting lock hold time 0 wait time before grant 0 
*** WE ROLL BACK TRANSACTION (1)

Change 704601 had a related patch set uploaded (by Ladsgroup; author: Ladsgroup):

[mediawiki/core@master] Move saving user options to onTransactionPreCommitOrIdle

https://gerrit.wikimedia.org/r/704601

Change 704608 had a related patch set uploaded (by Ladsgroup; author: Ladsgroup):

[mediawiki/core@wmf/1.37.0-wmf.14] Move saving user options to onTransactionPreCommitOrIdle

https://gerrit.wikimedia.org/r/704608

Change 704609 had a related patch set uploaded (by Ladsgroup; author: Ladsgroup):

[mediawiki/core@wmf/1.37.0-wmf.12] Move saving user options to onTransactionPreCommitOrIdle

https://gerrit.wikimedia.org/r/704609

Change 704601 merged by jenkins-bot:

[mediawiki/core@master] Move saving user options to onTransactionPreCommitOrIdle

https://gerrit.wikimedia.org/r/704601

Change 704609 merged by jenkins-bot:

[mediawiki/core@wmf/1.37.0-wmf.12] Move saving user options to onTransactionPreCommitOrIdle

https://gerrit.wikimedia.org/r/704609

Change 704608 merged by jenkins-bot:

[mediawiki/core@wmf/1.37.0-wmf.14] Move saving user options to onTransactionPreCommitOrIdle

https://gerrit.wikimedia.org/r/704608

Mentioned in SAL (#wikimedia-operations) [2021-07-14T20:58:16Z] <ladsgroup@deploy1002> Synchronized php-1.37.0-wmf.14/includes/user/User.php: Backport: [[gerrit:704608|Move saving user options to onTransactionPreCommitOrIdle (T286521)]] (duration: 01m 05s)

Mentioned in SAL (#wikimedia-operations) [2021-07-14T21:08:03Z] <ladsgroup@deploy1002> Synchronized php-1.37.0-wmf.12/includes/user/User.php: Backport: [[gerrit:704609|Move saving user options to onTransactionPreCommitOrIdle (T286521)]] (duration: 01m 05s)

This didn't fix the problem, probably we have to move it to a deferred update then :/

Still seeing instances of this since the most recent patch was deployed:

2 __▁__▁_ 2057    2123    DBQueryError .12 i/l/r/d/Database:1760  Error 1213: Deadlock found when trying to get lock; try restarting transaction (db2112)
1 _____▁_ 2120    2120    DBQueryError .14 i/l/r/d/Database:1760  Error 1213: Deadlock found when trying to get lock; try restarting transaction (db2079)
1 ___▁___ 2107    2107    DBQueryError .12 i/l/r/d/Database:1760  Error 1213: Deadlock found when trying to get lock; try restarting transaction (db2079)

It looks like this might be a difficult problem to solve and having the pressure of being a train blocker may not make things better. Is there any way to back out the change that caused this in wmf.12/wmf.14 and work on a fix in mainline?

it's been 9 of them in the last 3 hours, and they're happening on wmf.12 as well. Does this actually constitute a train blocker?

I don't think the problem is with user table, it's not mentioned in the deadlock description:

------------------------
LATEST DETECTED DEADLOCK
------------------------
2021-07-14 21:23:10 7ed5d10ac700
*** (1) TRANSACTION:
TRANSACTION 84888255769, ACTIVE 0 sec inserting
mysql tables in use 1, locked 1
LOCK WAIT 11 lock struct(s), heap size 2936, 5 row lock(s), undo log entries 9
MySQL thread id 159901215, OS thread handle 0x7ed750871700, query id 8332643241 10.192.48.20 wikiuser update
INSERT /* MediaWiki\User\UserOptionsManager::saveOptions  */ IGNORE INTO `user_properties` (up_user,up_property,up_value) VALUES (42174775,'popups','1'),(42174775,'VectorSkinVersion','1'),(42174775,'rcenhancedfilters-seen-tour',1),(42174775,'wlenhancedfilters-seen-tour',1)
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 768 page no 1633676 n bits 336 index `PRIMARY` of table `enwiki`.`user_properties` trx table locks 6 total table locks 2  trx id 84888255769 lock_mode X insert intention waiting lock hold time 0 wait time before grant 0 
*** (2) TRANSACTION:
TRANSACTION 84888255612, ACTIVE 1 sec inserting
mysql tables in use 1, locked 1
18 lock struct(s), heap size 2936, 19 row lock(s), undo log entries 24
MySQL thread id 159901152, OS thread handle 0x7ed5d10ac700, query id 8332643282 10.192.0.48 wikiuser update
INSERT /* MediaWiki\User\UserOptionsManager::saveOptions  */ IGNORE INTO `user_properties` (up_user,up_property,up_value) VALUES (42174774,'rcenhancedfilters-seen-tour',1),(42174774,'wlenhancedfilters-seen-tour',1),(42174774,'welcomesurvey-responses','{\"_group\":\"NONE\",\"_render_date\":\"20210714212310\"}')
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 768 page no 1633676 n bits 336 index `PRIMARY` of table `enwiki`.`user_properties` trx table locks 9 total table locks 2  trx id 84888255612 lock_mode X lock hold time 0 wait time before grant 0 
*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 768 page no 1633676 n bits 336 index `PRIMARY` of table `enwiki`.`user_properties` trx table locks 9 total table locks 2  trx id 84888255612 lock_mode X insert intention waiting lock hold time 0 wait time before grant 0 
*** WE ROLL BACK TRANSACTION (1)

UserOptionsManager on saveSettings does a DELETE first and then an INSERT. The DELETE is done to clean out the rows that are now default.

So,

  1. TRX2 makes a delete. delete grabs a next-key lock which is a combination of a record lock and a gap lock for the keys immediately before the deleted row.
  2. TRX1 attempt to make an insert, tries to grab the lock, before grabbing the exclusive index lock, it needs an insert intention gap lock - blocked by 1.
  3. TRX2 attempt an insert, same thing.
  4. Deadlock.

I think the reason for these deadlocks is @tstarling patch https://gerrit.wikimedia.org/r/c/mediawiki/core/+/700722 - we now are much better in detecting that the DB row is the same as default (prior we were terrible at that because preferences are a type mess), so we issue much more deletes now - it's cleaning up a ton of unnecessary rows already in the database.

What we could potentially do is to issue cleanup deletes in the deferred update. And instead of inserting, we can upsert.

Is there a maintenance script to clean up the user_options table?

it's been 9 of them in the last 3 hours, and they're happening on wmf.12 as well. Does this actually constitute a train blocker?

After discussion in RelEng, we think probably not, at least for this train. We'd like to treat this as:

  1. Still UBN per @Krinkle's original assessment, because it does represent user-facing breakage. (If it's not that high a severity, somebody more qualified than me can make that call.)
  2. Not a blocker for wmf.14 since the train rolling forward has no bearing on whether the bug is present, and it's not happening at the kind of rate that makes deployment unsafe.
  3. A blocker for wmf.15 on the same logic we're using for deprecation warnings, since it does represent some amount of ongoing log noise.

@brennen thank you, this sounds right.

The alternative is to revert Tim's patch I reference above which will make us write more to the DB and will bring back T280220, which I think was happening with more frequency.

There's a number of things we can try, but the first one is that there's a cleanupPreferences script, but it doesn't do what we need it to do. I'll add support for the type of cleanup we want.

Change 704639 had a related patch set uploaded (by Ppchelko; author: Ppchelko):

[mediawiki/core@master] cleanupPreferences: option for deleting default values

https://gerrit.wikimedia.org/r/704639

Change 704642 had a related patch set uploaded (by Ppchelko; author: Ppchelko):

[mediawiki/core@master] Defer cleaning up user_properties rows if they match default.

https://gerrit.wikimedia.org/r/704642

UserOptionsManager on saveSettings does a DELETE first and then an INSERT. The DELETE is done to clean out the rows that are now default.

So,

  1. TRX2 makes a delete. delete grabs a next-key lock which is a combination of a record lock and a gap lock for the keys immediately before the deleted row.
  2. TRX1 attempt to make an insert, tries to grab the lock, before grabbing the exclusive index lock, it needs an insert intention gap lock - blocked by 1.
  3. TRX2 attempt an insert, same thing.
  4. Deadlock.

I wasn't able to reproduce a gap lock for a deletion of a row that exists, but for deletion of a non-existent row, it does indeed deadlock as you describe:

$ mysql --prompt 'A> '
$ mysql --prompt 'B> '
A> begin;
Query OK, 0 rows affected (0.000 sec)

B> begin;
Query OK, 0 rows affected (0.000 sec)

A> delete from user_properties where up_user=100 and up_property in ('test1','test2','test3');
Query OK, 0 rows affected (0.001 sec)

B> delete from user_properties where up_user=101 and up_property in ('test1','test2','test3');
Query OK, 0 rows affected (0.001 sec)

A> insert into user_properties (up_user,up_property,up_value) values (100, 'test1', '1');
(waits)

B> insert into user_properties (up_user,up_property,up_value) values (101, 'test','1');
ERROR 1213 (40001): Deadlock found when trying to get lock; try restarting transaction

Change 704649 had a related patch set uploaded (by Ppchelko; author: Ppchelko):

[mediawiki/core@master] Don't try to delete non-existent rows when saving options.

https://gerrit.wikimedia.org/r/704649

Change 704642 abandoned by Ppchelko:

[mediawiki/core@master] Defer cleaning up user_properties rows if they match default.

Reason:

https://gerrit.wikimedia.org/r/704642

Change 704649 merged by jenkins-bot:

[mediawiki/core@master] Don't try to delete non-existent rows when saving options.

https://gerrit.wikimedia.org/r/704649

Change 704618 had a related patch set uploaded (by Tim Starling; author: Ppchelko):

[mediawiki/core@wmf/1.37.0-wmf.12] Don't try to delete non-existent rows when saving options.

https://gerrit.wikimedia.org/r/704618

Change 704619 had a related patch set uploaded (by Tim Starling; author: Ppchelko):

[mediawiki/core@wmf/1.37.0-wmf.14] Don't try to delete non-existent rows when saving options.

https://gerrit.wikimedia.org/r/704619

Change 704618 merged by jenkins-bot:

[mediawiki/core@wmf/1.37.0-wmf.12] Don't try to delete non-existent rows when saving options.

https://gerrit.wikimedia.org/r/704618

Change 704619 merged by jenkins-bot:

[mediawiki/core@wmf/1.37.0-wmf.14] Don't try to delete non-existent rows when saving options.

https://gerrit.wikimedia.org/r/704619

Mentioned in SAL (#wikimedia-operations) [2021-07-15T06:47:20Z] <tstarling@deploy1002> Synchronized php-1.37.0-wmf.12/includes/user/UserOptionsManager.php: don't delete non-existent rows (T286521) (duration: 01m 07s)

Mentioned in SAL (#wikimedia-operations) [2021-07-15T06:49:51Z] <tstarling@deploy1002> Synchronized php-1.37.0-wmf.14/includes/user/UserOptionsManager.php: don't delete non-existent rows (T286521) (duration: 01m 06s)

Pchelolo lowered the priority of this task from Unbreak Now! to High.Jul 15 2021, 2:39 PM

I'll have a look at the last one, but 1 error in 5 hours is definitely not a UBN anymore.

Pchelolo claimed this task.

WE've had none of these in the last 4 days, I think it's resolved. Other improvements I want to do to UserOptionsManager will be done under T286576

Change 704639 abandoned by Ppchelko:

[mediawiki/core@master] cleanupPreferences: option for deleting default values

Reason:

not needed

https://gerrit.wikimedia.org/r/704639