Page MenuHomePhabricator

Selenium tests sometimes fail due to deadlock in User::addToDatabase from api.php?action=createaccount
Closed, ResolvedPublic

Description

I can see reproducible (testcase is attached) DBQueryError exceptions when multiple users try to CreateAccount via API simultaneously.

Function: User::addToDatabase\nError: 1213 Deadlock found when trying to get lock; try restarting transaction

I've been running a Selenium test in 5 browsers. The test starts with creating an account via api.php?action=createaccount, which sometimes fails with DBQueryError failures.
So I wrote a small testcase "send 10 async requests to api.php?action=createaccount". It fails 100% of the time (both for latest MediaWiki 1.31 and legacy MediaWiki 1.27).

Reproducible with 10.2.16-MariaDB, 10.3.8-MariaDB, MySQL 5.6.

Testcase to reproduce

npm install
API=http://somewiki/w/api.php node ReproduceBug.js

Related Objects

StatusSubtypeAssignedTask
OpenNone
ResolvedFeaturekostajh
OpenNone
ResolvedPeter
ResolvedNone
ResolvedNone
Resolvedawight
OpenNone
Resolvedcscott
Resolvedkostajh
Resolvedkostajh
Resolvedkostajh
Resolvedhashar
OpenNone
ResolvedPRODUCTION ERRORhoo
ResolvedDreamy_Jazz
OpenNone
ResolvedKrinkle
Resolvedhashar

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Change 752149 had a related patch set uploaded (by Kosta Harlan; author: Kosta Harlan):

[mediawiki/tools/api-testing@master] Prefix titles/usernames with random strings

https://gerrit.wikimedia.org/r/752149

In T199393#4420174 @Anomie got more detailed via MariaDB innodb_print_all_deadlocks and the RECORD LOCKS entries have:

n bits144
indexuser_name

The test used T199393.153140698912290 and T199393.153140698912291 and Anomie mentions they end up locking the same gap. It might be the n bits and tentatively the lock would be held solely against the first 144 bits (or 18 bytes) of the field. Both usernames share the same first 18 bytes: T199393.1531406989

That leads me to https://mariadb.com/kb/en/innodb-lock-modes/#gap-locks or https://dev.mysql.com/doc/refman/8.0/en/innodb-locking.html#innodb-gap-locks

The doc states:

If a statement uses all the columns of a unique index to search for unique row, gap locking is not used.

The user_name field as a unique index, MySQL states at https://dev.mysql.com/doc/refman/8.0/en/innodb-locks-set.html that there is no gap lock for INSERT but there is still an "intent lock".

The usernames are made unique using a timestamp, maybe we can just reverse the username or we could provide a functionality in wdio-mediawiki which would provide a unique username which first bytes vary enough that we would not be subject to the gap locking. Then looking at the api testing database errors:

From User::addToDatabase, Deadlock found when trying to get lock; try restarting transaction
INSERT IGNORE INTO user (user_name,user_password,user_newpassword,user_email,user_email_authenticated,user_real_name,user_token,user_registration,user_editcount,user_touched)

It had the following values:

VALUES ('Mindyoj759hBix9','','','',NULL,'','ce1a2dc893f4e96a910874933bfba74e','20220106212220',0,'20220106212220')
VALUES ('SamBSfS8YKS4c','','','',NULL,'','f95af29cde6caa2f166e1f9f0f03b5f3','20220106212220',0,'20220106212220')
VALUES ('MindykQGNKOhn9Y','','','',NULL,'','81279df004949914ef099cd4045cd231','20220106212220',0,'20220106212220')
VALUES ('AliceIvuoxLxXnE','','','',NULL,'','0f61b5f2fcd9d0cecad9aa76e724d43f','20220106212220',0,'20220106212220')
VALUES ('Alice3CJMPIaK17','','','',NULL,'','1874ed0a7917d4f159ecccc796b900d9','20220106212220',0,'20220106212220')
VALUES ('AliceV6V9Sv3C5O','','','',NULL,'','edb918b4b33d9c58e713ddcaf7dbaa73','20220106212220',0,'20220106212220')
VALUES ('63VHHrv1yMPFmpVyAt','','','',NULL,'','6457e8e0ab304a5ff0ae8ee3aadcb779','20220106212220',0,'20220106212220')
VALUES ('Kam r3DFlznNch','','','',NULL,'','fe1fd4bd8efd4ad23e0decf585096c07','20220106212221',0,'20220106212221')
VALUES ('BobntBsLlTnG8','','','',NULL,'','be3b457e09b49fe5232dec7d03a97a52','20220106212223',0,'20220106212223')
VALUES ('Fiona ikexEGwES6','','','',NULL,'','0e2f1e984e61c23aefa9486d949f0339','20220106212223',0,'20220106212223')

The user_name entries are largely different, but maybe those are due to a lock on user_registration and or user_touched :\

Change 752149 abandoned by Kosta Harlan:

[mediawiki/tools/api-testing@master] Prefix titles/usernames with random strings

Reason:

This doesn't work

https://gerrit.wikimedia.org/r/752149

Change 752356 had a related patch set uploaded (by Kosta Harlan; author: Kosta Harlan):

[mediawiki/tools/api-testing@master] Retry account creation to work around deadlock

https://gerrit.wikimedia.org/r/752356

Change 752143 abandoned by Kosta Harlan:

[mediawiki/core@master] Speculative fix for deadlock in User::addToDatabase() when api.php?action=createaccount

Reason:

This doesn't work. I think it's going to be more reasonable to deal with this in the testing tool by retrying creation attempts. Id309705ead69026ccc26bdf377c36bbf66dda372

https://gerrit.wikimedia.org/r/752143

Change 752356 merged by jenkins-bot:

[mediawiki/tools/api-testing@master] Retry account creation to work around deadlock

https://gerrit.wikimedia.org/r/752356

@kostajh indeed looks like blindly retrying the account creation is the easiest path forward. Well at least for api-testing / CI Thx!

Change 753533 had a related patch set uploaded (by Kosta Harlan; author: Kosta Harlan):

[mediawiki/extensions/DiscussionTools@master] PreferenceHooks: Don't call saveSettings in LocalUserCreated

https://gerrit.wikimedia.org/r/753533

Change 753534 had a related patch set uploaded (by Kosta Harlan; author: Kosta Harlan):

[mediawiki/extensions/Echo@master] EchoHooks: Don't call saveSettings in LocalUserCreated

https://gerrit.wikimedia.org/r/753534

Change 753522 had a related patch set uploaded (by Kosta Harlan; author: Kosta Harlan):

[mediawiki/core@master] [WIP] Run saveOptionsInternal in DeferredUpdate

https://gerrit.wikimedia.org/r/753522

Change 753534 merged by jenkins-bot:

[mediawiki/extensions/Echo@master] EchoHooks: Don't call saveSettings in LocalUserCreated

https://gerrit.wikimedia.org/r/753534

Change 753533 merged by jenkins-bot:

[mediawiki/extensions/DiscussionTools@master] PreferenceHooks: Don't call saveSettings in LocalUserCreated

https://gerrit.wikimedia.org/r/753533

Change 802189 had a related patch set uploaded (by Umherirrender; author: Umherirrender):

[mediawiki/core@master] auth: Improve documentation on LocalUserCreatedHook

https://gerrit.wikimedia.org/r/802189

Change 802189 merged by jenkins-bot:

[mediawiki/core@master] auth: Improve documentation on LocalUserCreatedHook

https://gerrit.wikimedia.org/r/802189

Zabe added a subscriber: Lucas_Werkmeister_WMDE.
Zabe subscribed.
In T199393#4420174, Anomie wrote:

...
Running MariaDB with innodb_print_all_deadlocks, the deadlock reports:
...

I have filed T342088 to get Quibble to run MariaDB with innodb_print_all_deadlocks.

Change 939660 had a related patch set uploaded (by Hashar; author: Hashar):

[mediawiki/core@master] wdio-mediawiki: increase bit entropy of getTestString()

https://gerrit.wikimedia.org/r/939660

Here is a deadlock report from a recent experiment with Selenium tests:

2023-07-19 10:37:59 134 [Note] InnoDB: Transactions deadlock detected, dumping detailed information.
2023-07-19 10:37:59 134 [Note] InnoDB: 
*** (1) TRANSACTION:

TRANSACTION 1588, ACTIVE 0 sec inserting
mysql tables in use 1, locked 1
LOCK WAIT 8 lock struct(s), heap size 1128, 3 row lock(s), undo log entries 3
MySQL thread id 138, OS thread handle 140248813016832, query id 3403 localhost wikiuser Update
INSERT /* User::addToDatabase  */ IGNORE INTO `user` (user_name,user_password,user_newpassword,user_email,user_email_authenticated,user_real_name,user_token,user_registration,user_editcount,user_touched) VALUES ('User-0.11215110429192343-Iñtërnâtiônà lizætiøn','','','',NULL,'','72f6d6ec68cd6f09ad1c8a29fa406426','20230719103759',0,'20230719103759')
2023-07-19 10:37:59 134 [Note] InnoDB: *** (1) WAITING FOR THIS LOCK TO BE GRANTED:

RECORD LOCKS space id 57 page no 4 n bits 72 index user_name of table `wikidb`.`user` trx id 1588 lock_mode X locks gap before rec insert intention waiting
Record lock, heap no 2 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
 0: len 9; hex 57696b6941646d696e; asc WikiAdmin;;
 1: len 4; hex 00000001; asc     ;;

2023-07-19 10:37:59 134 [Note] InnoDB: *** (2) TRANSACTION:

TRANSACTION 1583, ACTIVE 0 sec inserting
mysql tables in use 1, locked 1
8 lock struct(s), heap size 1128, 3 row lock(s), undo log entries 3
MySQL thread id 134, OS thread handle 140248814552832, query id 3409 localhost wikiuser Update
INSERT /* User::addToDatabase  */ IGNORE INTO `user` (user_name,user_password,user_newpassword,user_email,user_email_authenticated,user_real_name,user_token,user_registration,user_editcount,user_touched) VALUES ('User-0.32064525160185897-Iñtërnâtiônà lizætiøn','','','',NULL,'','72ae956140f4bb4253f98a3bf11bf032','20230719103759',0,'20230719103759')
2023-07-19 10:37:59 134 [Note] InnoDB: *** (2) HOLDS THE LOCK(S):

RECORD LOCKS space id 57 page no 4 n bits 72 index user_name of table `wikidb`.`user` trx id 1583 lock mode S locks gap before rec
Record lock, heap no 2 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
 0: len 9; hex 57696b6941646d696e; asc WikiAdmin;;
 1: len 4; hex 00000001; asc     ;;

2023-07-19 10:37:59 134 [Note] InnoDB: *** (2) WAITING FOR THIS LOCK TO BE GRANTED:

RECORD LOCKS space id 57 page no 4 n bits 72 index user_name of table `wikidb`.`user` trx id 1583 lock_mode X locks gap before rec insert intention waiting
Record lock, heap no 2 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
 0: len 9; hex 57696b6941646d696e; asc WikiAdmin;;
 1: len 4; hex 00000001; asc     ;;

2023-07-19 10:37:59 134 [Note] InnoDB: *** WE ROLL BACK TRANSACTION (2)

Which has "RECORD LOCKS space id 57 page no 4 n bits 72 index user_name of table wikidb.user" with the user_name fields being:

User-0.11215110429192343-Iñtërnâtiônà lizætiøn
User-0.32064525160185897-Iñtërnâtiônà lizætiøn'

Which share the same first 7 characters (7 × 8 = 56 bits < 72 bits ¿?), but maybe Innodb lock the whole gap between existing records.

Change 939660 abandoned by Hashar:

[mediawiki/core@master] wdio-mediawiki: increase bit entropy of getTestString()

Reason:

That is not solving the issue: I have misunderstood how Innodb gap locking is working :)

https://gerrit.wikimedia.org/r/939660

I have updated all the Jenkins jobs to use Quibble 1.5.6 which now spawns MariaDB with --innodb-print-all-deadlocks. Thus if I got it right, when a deadlock occurs details should be available in the log attached to the builds as log/mysql-error.log.

Change 753522 abandoned by Kosta Harlan:

[mediawiki/core@master] [WIP] Run saveOptionsInternal in DeferredUpdate

Reason:

https://gerrit.wikimedia.org/r/753522

Krinkle renamed this task from Reproducible deadlock in User::addToDatabase() when api.php?action=createaccount is called simultaneously by several users to Selenium tests sometimes fail due to deadlock in User::addToDatabase from api.php?action=createaccount.Aug 19 2024, 7:55 PM
Krinkle claimed this task.
Krinkle removed a project: Patch-Needs-Improvement.

This is an updated version of the script by @edwardspec. Main changes are

  • to improve error detection because a number of errors were not "caught" so it made it seem that it was working when it wasn't, for example when account creation fails for reasons relating to tokens, password strength, sessions, or auth providers.
  • the password strength was causing the creation to fail, so I made it use use a stronger password because the previous one is now rejected by default on a new and unmodified MediaWiki install.
  • Log when creation succeeds.

1/* T371040: Reproduce a MySQL deadlock
2 when creating several MediaWiki accounts at the same time */
3
4'use strict';
5
6const MW_API_URL = 'http://localhost:4000/api.php';
7const CONCURRENCY = 20;
8
9/*---------------------------------------------------------------------------*/
10
11const MWBot = require( 'mwbot' );
12const request = require( 'request' );
13
14for ( var i = 0; i < CONCURRENCY; i ++ ) {
15 makeTestUser();
16}
17
18async function makeTestUser() {
19 var username = 'Test ' + Date.now() + ' ' + Math.random();
20 var password = 'SomePasswordThatsReasonablyUnique';
21
22 var bot = new MWBot( {
23 apiUrl: MW_API_URL,
24 verbose: true
25 }, { jar: request.jar() } );
26
27 await bot.getCreateaccountToken();
28 var resp;
29 try {
30 resp = await bot.request( {
31 action: 'createaccount',
32 createreturnurl: 'http://example.test/', // Unused
33 createtoken: bot.createaccountToken,
34 username: username,
35 password: password,
36 retype: password
37 } );
38 } catch (error) {
39 var apiResponse = error.response.error;
40 var exceptionText = apiResponse['*'];
41 delete apiResponse['*'];
42 console.error( 'Error from api.php: ', apiResponse, exceptionText );
43 return;
44 }
45
46 if (resp.createaccount.status !== 'PASS') {
47 console.error( 'Error from api.php?action=createaccount: ', resp.createaccount.status, resp.createaccount.message );
48 return;
49 }
50
51 console.log('Created [[User:' + username + ']]');
52}

Running it on Quickstart with the default sqlite, does not reproduce the error. I ran it with concurrency=8 on the PHP server and concurrency=20 on the account creation script. All succeeded. Switching to MySQL:

  • mv LocalSettings.php LocalSettings.sqlite.php
  • brew install mariadb; mysql -uroot -proot; mysql CREATE DATABASE my_wiki;
  • http://localhost:4000 to follow the web installer to use the MariaDB database
  • move the downloaded LocalSettings.php file in the MW folder.
  • I then logged-in with the Admin account and created 1 account manually to confirm that this works.
  • add require_once "$IP/includes/DevelopmentSettings.php"; to enable debug logging to the composer serve command-line output.
script output C20
$ node tmp2.js
Error from api.php:  {
  code: 'internal_api_error_DBQueryError',
  info: '[5b5afaaf0f73ccddfac5350a] Caught exception of type Wikimedia\\Rdbms\\DBQueryError',
  errorclass: 'Wikimedia\\Rdbms\\DBQueryError'
} undefined
Error from api.php:  {
  code: 'internal_api_error_DBQueryError',
  info: '[e71de91c68e559b045fb27f5] Caught exception of type Wikimedia\\Rdbms\\DBQueryError',
  errorclass: 'Wikimedia\\Rdbms\\DBQueryError'
} undefined
Error from api.php:  {
  code: 'internal_api_error_DBQueryError',
  info: '[b8dc289424ea8ea728e20e89] Caught exception of type Wikimedia\\Rdbms\\DBQueryError',
  errorclass: 'Wikimedia\\Rdbms\\DBQueryError'
} undefined
Error from api.php:  {
  code: 'internal_api_error_DBQueryError',
  info: '[8d82606f52ddc0214ce1ed4f] Caught exception of type Wikimedia\\Rdbms\\DBQueryError',
  errorclass: 'Wikimedia\\Rdbms\\DBQueryError'
} undefined
Error from api.php:  {
  code: 'internal_api_error_DBQueryError',
  info: '[1f7fe2619eae98a5ae367b77] Caught exception of type Wikimedia\\Rdbms\\DBQueryError',
  errorclass: 'Wikimedia\\Rdbms\\DBQueryError'
} undefined
Error from api.php:  {
  code: 'internal_api_error_DBQueryError',
  info: '[2722fe2246dadf347ec1fa7c] Caught exception of type Wikimedia\\Rdbms\\DBQueryError',
  errorclass: 'Wikimedia\\Rdbms\\DBQueryError'
} undefined
Error from api.php:  {
  code: 'internal_api_error_DBQueryError',
  info: '[ce845be551dcf7fac59201c5] Caught exception of type Wikimedia\\Rdbms\\DBQueryError',
  errorclass: 'Wikimedia\\Rdbms\\DBQueryError'
} undefined
Created [[User:Test 1724532117097 0.8937994270776819]]
Error from api.php:  {
  code: 'internal_api_error_DBQueryError',
  info: '[ed81aa2cbac1313da59cbbbe] Caught exception of type Wikimedia\\Rdbms\\DBQueryError',
  errorclass: 'Wikimedia\\Rdbms\\DBQueryError'
} undefined
Error from api.php:  {
  code: 'internal_api_error_DBQueryError',
  info: '[f2bad040f779aa6e4fae25e4] Caught exception of type Wikimedia\\Rdbms\\DBQueryError',
  errorclass: 'Wikimedia\\Rdbms\\DBQueryError'
} undefined
Error from api.php:  {
  code: 'internal_api_error_DBQueryError',
  info: '[86cbd53d646a6141b8b064e4] Caught exception of type Wikimedia\\Rdbms\\DBQueryError',
  errorclass: 'Wikimedia\\Rdbms\\DBQueryError'
} undefined
Error from api.php:  {
  code: 'internal_api_error_DBQueryError',
  info: '[47346b186ee6d80ea954e750] Caught exception of type Wikimedia\\Rdbms\\DBQueryError',
  errorclass: 'Wikimedia\\Rdbms\\DBQueryError'
} undefined
Created [[User:Test 1724532117098 0.4997028520023783]]
Created [[User:Test 1724532117091 0.26343240272892343]]
Error from api.php:  {
  code: 'internal_api_error_DBQueryError',
  info: '[ff20449e15bdb0e8b31da358] Caught exception of type Wikimedia\\Rdbms\\DBQueryError',
  errorclass: 'Wikimedia\\Rdbms\\DBQueryError'
} undefined
Error from api.php:  {
  code: 'internal_api_error_DBQueryError',
  info: '[1292bcd05577f58f72b6835b] Caught exception of type Wikimedia\\Rdbms\\DBQueryError',
  errorclass: 'Wikimedia\\Rdbms\\DBQueryError'
} undefined
Created [[User:Test 1724532117097 0.9279752336243072]]
Created [[User:Test 1724532117094 0.6194891546761487]]
Created [[User:Test 1724532117095 0.6972772046645386]]
Created [[User:Test 1724532117095 0.08528526517402923]]

The output logs 16 failures before the first success. Only the last 4 to complete were succesful. Re-running this a few times, we ocasionally see one or two earlier one succeed, but the majority fail.

With concurrency=1, success looks like this (manually reduced from the debug output to highlight some notable entries):

debug.log C1 api tokens
127.0.0.1:54094 Accepted
Start request POST /api.php?format=json
USER-AGENT: mwbot/2.1.3
CONTENT-LENGTH: 43
[session] SessionManager using store SqlBagOStuff
[objectcache] MainWANObjectCache using store EmptyBagOStuff
[session] SessionBackend "su01jm4ammun996hcevem008dj45n5c5" is unsaved, marking dirty in constructor
[session] SessionBackend "su01jm4ammun996hcevem008dj45n5c5" save: dataDirty=1 metaDirty=1 forcePersist=0
[session] SessionBackend "su01jm4ammun996hcevem008dj45n5c5" data dirty due to dirty(): SessionManager->getSessionForRequest
[session] SessionBackend "su01jm4ammun996hcevem008dj45n5c5" save: dataDirty=1 metaDirty=0 forcePersist=0
[session] SessionBackend "su01jm4ammun996hcevem008dj45n5c5" force-persist due to persist()
[session] SessionBackend "su01jm4ammun996hcevem008dj45n5c5" save: dataDirty=0 metaDirty=1 forcePersist=1
[session] Persisting session for unknown reason
[session] SessionBackend "su01jm4ammun996hcevem008dj45n5c5" Taking over PHP session
[session] SessionBackend "su01jm4ammun996hcevem008dj45n5c5" save: dataDirty=0 metaDirty=1 forcePersist=1
…
[rdbms] SqlBagOStuff::modifyTableSpecificBlobsForSet localhost: REPLACE INTO `objectcache` (keyname,value,exptime) VALUES ('my_wiki:MWSession:su01jm4ammun996hcevem008dj45n5c5',…)
[session] SessionBackend "su01jm4ammun996hcevem008dj45n5c5" data dirty due to dirty(): ApiQueryTokens->execute/…
…
[session] SessionBackend "su01jm4ammun996hcevem008dj45n5c5" save: dataDirty=1 metaDirty=0 forcePersist=0
…
[rdbms] SqlBagOStuff::modifyTableSpecificBlobsForSet localhost: REPLACE INTO `objectcache` (keyname,value,exptime) VALUES ('my_wiki:MWSession:su01jm4ammun996hcevem008dj45n5c5',…)
MediaWiki\MediaWikiEntryPoint::commitMainTransaction: session changes committed
…
[rdbms] MediaWiki\MediaWikiEntryPoint::commitMainTransaction localhost: COMMIT
debug.log C1 api createaccount
Start request POST /api.php?format=json
USER-AGENT: mwbot/2.1.3
COOKIE: my_wiki_session=su01jm4ammun996hcevem008dj45n5c5; UseDC=master
CONTENT-LENGTH: 260
…
[session] Session "su01jm4ammun996hcevem008dj45n5c5" requested without UserID cookie
…
[rdbms] SqlBagOStuff::fetchBlobs localhost: SELECT keyname,value,exptime FROM `objectcache` WHERE keyname = 'my_wiki:MWSession:su01jm4ammun996hcevem008dj45n5c5' …)
…
[SQLBagOStuff] MicroStash using store SqlBagOStuff
[BlockManager] Create account block cache miss with key BlockCacheKey{request=#246,user=#354,replica}
[BlockManager] Block cache miss with key BlockCacheKey{request=#246,user=#354,primary}
…
[rdbms] MediaWiki\Block\DatabaseBlockStore::newLoad localhost: SELECT … FROM `block` … WHERE ((bt_address = '127.0.0.1' OR ((bt_ip_hex = '7F000001' AND bt_range_start IS NULL) OR (bt_range_start LIKE '7F00%' ESCAPE '`' AND bt_range_start <= '7F000001' AND bt_range_end >= '7F000001'))))
[rdbms] MediaWiki\Block\DatabaseBlockStore::newLoad localhost: SELECT … FROM `block` … WHERE ((((bt_ip_hex = '7F000001' AND bt_range_start IS NULL) OR (bt_range_start LIKE '7F00%' ESCAPE '`' AND bt_range_start <= '7F000001' AND bt_range_end >= '7F000001'))))
[rdbms] SqlBagOStuff::fetchBlobs localhost: SELECT keyname,value,exptime FROM `objectcache` WHERE keyname = 'my_wiki:messages:en' …)
[rdbms] SqlBagOStuff::doLock localhost: SELECT IF(GET_LOCK('my_wiki:messages:en',0),…) AS acquired
…
[rdbms] SqlBagOStuff::doUnlock localhost: SELECT RELEASE_LOCK('my_wiki:messages:en') AS released
…
[rdbms] TemporaryPasswordPrimaryAuthenticationProvider::testUserExists localhost: SELECT user_id FROM `user` WHERE user_name = 'Test 1724532771985' LIMIT 1 LOCK IN SHARE MODE
[rdbms] LocalPasswordPrimaryAuthenticationProvider::testUserExists localhost: SELECT user_id FROM `user` WHERE user_name = 'Test 1724532771985' LIMIT 1 LOCK IN SHARE MODE
[rdbms] User::load localhost: SELECT actor_id,actor_user,actor_name FROM `actor` WHERE actor_name = 'Test 1724532771985' LIMIT 1 LOCK IN SHARE MODE
…
[rdbms] SqlBagOStuff::modifyTableSpecificBlobsForSet localhost: REPLACE INTO `objectcache` (keyname,value,exptime) VALUES ('my_wiki:MWSession:su01jm4ammun996hcevem008dj45n5c5',…)
…
[BlockManager] Create account block cache miss with key BlockCacheKey{request=#246,user=#643,replica}
[rdbms] MediaWiki\Block\DatabaseBlockStore::newLoad localhost: SELECT … FROM `block` … WHERE ((bt_address = '127.0.0.1' OR ((bt_ip_hex = '7F000001' AND bt_range_start IS NULL) OR (bt_range_start LIKE '7F00%' ESCAPE '`' AND bt_range_start <= '7F000001' AND bt_range_end >= '7F000001'))))
[rdbms] MediaWiki\Block\DatabaseBlockStore::newLoad localhost: SELECT … FROM `block` … WHERE ((((bt_ip_hex = '7F000001' AND bt_range_start IS NULL) OR (bt_range_start LIKE '7F00%' ESCAPE '`' AND bt_range_start <= '7F000001' AND bt_range_end >= '7F000001'))))
[rdbms] User::load localhost: SELECT actor_id,actor_user,actor_name FROM `actor` WHERE actor_name = 'Test 1724532771985' LIMIT 1 LOCK IN SHARE MODE
[rdbms] User::load localhost: SELECT actor_id,actor_user,actor_name FROM `actor` WHERE actor_name = 'Test 1724532771985' LIMIT 1
[authentication] AuthManager::continueAccountCreation: Primary creation passed by LocalPasswordPrimaryAuthenticationProvider
[authentication] Creating user Test 1724532771985 during account creation
…
[rdbms] startAtomic: entering level 0 (User::addToDatabase)
[rdbms] User::addToDatabase localhost: INSERT IGNORE INTO `user` (user_name,user_password,user_newpassword,user_email,user_email_authenticated,user_real_name,user_token,user_registration,user_editcount,user_touched,user_is_temp) VALUES ('Test 1724532771985','','','',NULL,'','b4c6614f90694618dc83cc2bde21b9e6','20240824205252',0,'20240824205252',0)
[rdbms] ActorStore::findActorIdInternal localhost: SELECT actor_user,actor_name,actor_id FROM `actor` WHERE actor_name = 'Test 1724532771985' LIMIT 1
[rdbms] ActorStore::acquireActorId localhost: INSERT IGNORE INTO `actor` (actor_user,actor_name) VALUES (65,'Test 1724532771985')
[rdbms] endAtomic: leaving level 0 (User::addToDatabase)
…
[rdbms] Options\LocalUserOptionsStore::fetch localhost: SELECT up_property,up_value FROM `user_properties` WHERE up_user = 65
…
[rdbms] startAtomic: entering level 0 (User::saveSettings)
[rdbms] User::saveSettings localhost: UPDATE `user` SET user_name = 'Test 1724532771985',user_real_name = '',user_email = '',user_email_authenticated = NULL,user_touched = '20240824205253',user_token = 'f6a68b2f678f43caa1ca776928d61e50',user_email_token = '',user_email_token_expires = NULL WHERE user_id = 65 AND user_touched = '20240824205252'
[rdbms] User::saveSettings localhost: UPDATE `actor` SET actor_name = 'Test 1724532771985' WHERE actor_user = 65
[rdbms] endAtomic: leaving level 0 (User::saveSettings)
…
[rdbms] WatchedItemStore::addWatchBatchForUser localhost: INSERT IGNORE INTO `watchlist` (wl_user,wl_namespace,wl_title,wl_notificationtimestamp) VALUES (65,2,'Test_1724532771985_0.05128293441160503',NULL)
[objectcache] getWithSetCallback(global:rdbms-server-readonly:localhost): process cache hit
[rdbms] WatchedItemStore::addWatchBatchForUser localhost: INSERT IGNORE INTO `watchlist` (wl_user,wl_namespace,wl_title,wl_notificationtimestamp) VALUES (65,3,'Test_1724532771985_0.05128293441160503',NULL)
[rdbms] LocalPasswordPrimaryAuthenticationProvider::providerChangeAuthenticationData localhost: UPDATE `user` SET user_password = ':pbkdf2:…',user_password_expires = NULL WHERE user_name = 'Test 1724532771985'
[rdbms] ActorStore::findActorIdInternal localhost: SELECT actor_user,actor_name,actor_id FROM `actor` WHERE actor_name = 'Test 1724532771985' LIMIT 1
[rdbms] PageStore::getPageByNameViaLinkCache localhost: SELECT page_id,page_namespace,page_title,page_is_redirect,page_is_new,page_touched,page_links_updated,page_latest,page_len,page_content_model FROM `page` WHERE page_namespace = 2 AND page_title = 'Test_1724532771985_0.05128293441160503' LIMIT 1
[rdbms] MediaWiki\CommentStore\CommentStore::createComment localhost: SELECT comment_id FROM `comment` WHERE comment_hash = 0 AND comment_text = '' AND comment_data IS NULL LIMIT 1
[rdbms] ManualLogEntry::insert localhost: INSERT INTO `logging` (log_type,log_action,log_timestamp,log_actor,log_namespace,log_title,log_page,log_params,log_comment_id) VALUES ('newusers','create','20240824205252',22,2,'Test_1724532771985_0.05128293441160503',0,'a:1:{s:9:\"4::userid\";i:65;}',1)
[authentication] AuthManager::continueAccountCreation: Secondary creation passed by EmailNotificationSecondaryAuthenticationProvider
[authentication] AuthManager::continueAccountCreation: Account creation succeeded for Test 1724532771985

…
[authevents] createaccount API attempt
[rdbms] MediaWiki\MediaWikiEntryPoint::commitMainTransaction localhost: COMMIT
…
[DeferredUpdates] DeferredUpdates::run: started MediaWiki\Deferred\SiteStatsUpdate #610
[rdbms] MediaWiki\Deferred\SiteStatsUpdate::doUpdate localhost: INSERT INTO `site_stats` (ss_row_id,ss_total_edits,ss_total_pages,ss_good_articles,ss_users,ss_images) VALUES (1,0,0,0,1,0) ON DUPLICATE KEY UPDATE ss_users = GREATEST(`ss_users`+1,0)
[DeferredUpdates] DeferredUpdates::run: ended MediaWiki\Deferred\SiteStatsUpdate #610, processing time: 0.00089311599731445
[DeferredUpdates] DeferredUpdates::run: started MediaWiki\Deferred\MWCallableUpdate_ManualLogEntry->publish #689
[rdbms] UserGroupManager::getUserGroupMemberships localhost: SELECT ug_user,ug_group,ug_expiry FROM `user_groups` WHERE ug_user = 65
[rdbms] MediaWiki\CommentStore\CommentStore::createComment localhost: SELECT comment_id FROM `comment` WHERE comment_hash = 0 AND comment_text = '' AND comment_data IS NULL LIMIT 1
[rdbms] RecentChange::save localhost: INSERT INTO `recentchanges` (rc_timestamp,rc_namespace,rc_title,rc_type,rc_source,rc_minor,rc_this_oldid,rc_last_oldid,rc_bot,rc_ip,rc_patrolled,rc_new,rc_old_len,rc_new_len,rc_deleted,rc_logid,rc_log_type,rc_log_action,rc_params,rc_comment_id,rc_actor) VALUES ('20240824205252',2,'Test_1724532771985_0.05128293441160503',3,'mw.log',0,0,0,0,'127.0.0.1',2,0,NULL,NULL,0,21,'newusers','create','a:1:{s:9:\"4::userid\";i:65;}',1,22)
[rdbms] ManualLogEntry->publish localhost: COMMIT
[rdbms] WatchedItemStore::updateNotificationTimestamp localhost: SELECT wl_id,wl_user FROM `watchlist` WHERE (wl_user != 65) AND wl_namespace = 2 AND wl_title = 'Test_1724532771985_0.05128293441160503' AND wl_notificationtimestamp IS NULL
[DeferredUpdates] DeferredUpdates::run: ended MediaWiki\Deferred\MWCallableUpdate_ManualLogEntry->publish #689, processing time: 0.0039339065551758
[DeferredUpdates] DeferredUpdates::run: started MediaWiki\Deferred\JobQueueEnqueueUpdate #699
[DeferredUpdates] DeferredUpdates::run: ended MediaWiki\Deferred\JobQueueEnqueueUpdate #699, processing time: 7.9154968261719E-5
[DeferredUpdates] DeferredUpdates::run: started MediaWiki\Deferred\JobQueueEnqueueUpdate #794
[rdbms] startAtomic: entering level 0 (Wikimedia\Rdbms\Database::onTransactionPreCommitOrIdle)
[rdbms] JobQueueDB::doBatchPush localhost: SELECT job_sha1 FROM `job` WHERE job_sha1 = 'p42glrsl783wj55io7lxhmhxf0e5db5' AND job_token = ''
Job with hash 'p42glrsl783wj55io7lxhmhxf0e5db5' is a duplicate.
[rdbms] endAtomic: leaving level 0 (Wikimedia\Rdbms\Database::onTransactionPreCommitOrIdle)
[rdbms] Wikimedia\Rdbms\Database::onTransactionPreCommitOrIdle localhost: COMMIT
[DeferredUpdates] DeferredUpdates::run: ended MediaWiki\Deferred\JobQueueEnqueueUpdate #794, processing time: 0.0020279884338379
MediaWiki\MediaWikiEntryPoint::commitMainTransaction: pre-send deferred updates completed
MediaWiki\MediaWikiEntryPoint::commitMainTransaction: session changes committed
[rdbms] Wikimedia\Rdbms\LoadBalancer::getAnyOpenConnection: found 'round' connection to #0.
[rdbms] Cannot use ChronologyProtector with EmptyBagOStuff
[rdbms] Wikimedia\Rdbms\LBFactory::shutdown: finished ChronologyProtector shutdown
[rdbms] LBFactory shutdown completed
[cookie] setcookie: "UseDC", "master", "1724532782", "/", "", "", "1", ""
[BlockManager] Block cache miss with key BlockCacheKey{request=#246,user=#354,replica}
[rdbms] MediaWiki\Block\DatabaseBlockStore::newLoad localhost: SELECT … FROM `block` … WHERE ((bt_address = '127.0.0.1' OR ((bt_ip_hex = '7F000001' AND bt_range_start IS NULL) OR (bt_range_start LIKE '7F00%' ESCAPE '`' AND bt_range_start <= '7F000001' AND bt_range_end >= '7F000001'))))
[rdbms] MediaWiki\MediaWikiEntryPoint::commitMainTransaction localhost: COMMIT
MediaWiki\MediaWikiEntryPoint::commitMainTransaction: primary transaction round committed
MediaWiki\MediaWikiEntryPoint::commitMainTransaction: pre-send deferred updates completed
MediaWiki\MediaWikiEntryPoint::commitMainTransaction: session changes committed
[rdbms] LBFactory shutdown completed
[BlockManager] Block cache hit with key BlockCacheKey{request=#246,user=#354,replica}
Output buffer flushed
[rdbms] LBFactory shutdown completed
Request ended normally

With concurrency=2, I can (almost) consistently reproduce the error. Given that 2 is a lot easier to untangle than 4, I'll prefer one that isn't super consistent where I can obtain a detailed sample than a mess where it happens consistently.

With C2, one fails, one succeeds.

script output C2
Error from api.php:  {
  code: 'internal_api_error_DBQueryError',
  info: '[0522a8e6a3ef65bb133fa1af] Exception caught: A database query error has occurred. This may indicate a bug in the software.',
  errorclass: 'Wikimedia\\Rdbms\\DBQueryError'
} at
…
#5 mediawiki/includes/user/User.php(2613): InsertQueryBuilder->execute()
…
#9 mediawiki/includes/user/User.php(2595): DBConnRef->doAtomicSection(string, Closure)
#10 mediawiki/includes/auth/AuthManager.php(1641): User->addToDatabase()
#11 mediawiki/includes/auth/AuthManager.php(1364): AuthManager->continueAccountCreation(array)
#12 mediawiki/includes/api/ApiAMCreateAccount.php(101): AuthManager->beginAccountCreation(User, array, string)
#13 mediawiki/includes/api/ApiMain.php(1948): ApiAMCreateAccount->execute()
…
#18 mediawiki/api.php(44): MediaWiki\MediaWikiEntryPoint->run()

Created [[User:Test 1724533993545 0.9002220726413372]]
debug.log C2 api tokens
[87723] [Sat Aug 24 22:13:13 2024] 127.0.0.1:54253 Accepted
[87723] [Sat Aug 24 22:13:13 2024] Start request POST /api.php?format=json
USER-AGENT: mwbot/2.1.3
CONTENT-LENGTH: 43

[87720] [Sat Aug 24 22:13:13 2024] 127.0.0.1:54254 Accepted
[87720] [Sat Aug 24 22:13:13 2024] Start request POST /api.php?format=json
USER-AGENT: mwbot/2.1.3
CONTENT-LENGTH: 43

[87723] [Sat Aug 24 22:13:13 2024] [rdbms] DatabaseMySQL::open
[87720] [Sat Aug 24 22:13:13 2024] [rdbms] DatabaseMySQL::open

[87720] [Sat Aug 24 22:13:13 2024] ApiMain::setCacheMode: setting cache mode private
[87723] [Sat Aug 24 22:13:13 2024] ApiMain::setCacheMode: setting cache mode private

[87723] [Sat Aug 24 22:13:13 2024] [rdbms] LBFactory shutdown completed
[87720] [Sat Aug 24 22:13:13 2024] [rdbms] LBFactory shutdown completed

[87723] [Sat Aug 24 22:13:13 2024] Request ended normally
[87720] [Sat Aug 24 22:13:13 2024] Request ended normally

[87723] [Sat Aug 24 22:13:13 2024] 127.0.0.1:54253 [200]: POST /api.php?format=json
[87723] [Sat Aug 24 22:13:13 2024] 127.0.0.1:54253 Closing
[87720] [Sat Aug 24 22:13:13 2024] 127.0.0.1:54254 [200]: POST /api.php?format=json
[87720] [Sat Aug 24 22:13:13 2024] 127.0.0.1:54254 Closing

Below process 87720 starts first, and will try to create Test 1724533993536, which will fail due to a deadlock.
Concurrently process 87724 will try to create 1724533993545, which will succeed.

I've added some spacing to group together pairs of lines from both and/or successive lines from the same process, and removed a lot of unrelated debug lines.

debug.log C2 api createaccount
[87720] [Sat Aug 24 22:13:13 2024] 127.0.0.1:54257 Accepted
[87724] [Sat Aug 24 22:13:13 2024] 127.0.0.1:54258 Accepted

[87720] [Sat Aug 24 22:13:13 2024] Start request POST /api.php?format=json
USER-AGENT: mwbot/2.1.3
COOKIE: my_wiki_session=hhv1eag3injfbg9e4hj4frt876puvp83; UseDC=master
CONTENT-LENGTH: 259

[87724] [Sat Aug 24 22:13:13 2024] Start request POST /api.php?format=json
USER-AGENT: mwbot/2.1.3
COOKIE: my_wiki_session=3p5vp9edg1d6d4p2qcc5ah1u8155q2hm; UseDC=master
CONTENT-LENGTH: 259

[87720] [Sat Aug 24 22:13:13 2024] [rdbms] DatabaseMySQL::open
[87724] [Sat Aug 24 22:13:13 2024] [rdbms] DatabaseMySQL::open

[87720] [Sat Aug 24 22:13:13 2024] [BlockManager] Create account block cache miss with key BlockCacheKey{request=#246,user=#354,replica}
[87724] [Sat Aug 24 22:13:13 2024] [BlockManager] Create account block cache miss with key BlockCacheKey{request=#246,user=#354,replica}

[87724] [Sat Aug 24 22:13:13 2024] [BlockManager] Block cache miss with key BlockCacheKey{request=#246,user=#354,primary}
[87720] [Sat Aug 24 22:13:13 2024] [BlockManager] Block cache miss with key BlockCacheKey{request=#246,user=#354,primary}

[87720] [Sat Aug 24 22:13:13 2024] [rdbms] DatabaseMySQL::open
[87724] [Sat Aug 24 22:13:13 2024] [rdbms] DatabaseMySQL::open

[87724] [Sat Aug 24 22:13:13 2024] [rdbms] Database::beginIfImplied (MediaWiki\Block\DatabaseBlockStore::newLoad) [0.045ms] localhost: BEGIN
[87720] [Sat Aug 24 22:13:13 2024] [rdbms] Database::beginIfImplied (MediaWiki\Block\DatabaseBlockStore::newLoad) [0.037ms] localhost: BEGIN

[87724] [Sat Aug 24 22:13:13 2024] [rdbms] MediaWiki\Block\DatabaseBlockStore::newLoad [0.404ms] localhost: SELECT … FROM `block` … WHERE ((bt_address = '127.0.0.1' OR ((bt_ip_hex = '7F000001' AND bt_range_start IS NULL) OR (bt_range_start LIKE '7F00%' ESCAPE '`' AND bt_range_start <= '7F000001' AND bt_range_end >= '7F000001'))))
[87720] [Sat Aug 24 22:13:13 2024] [rdbms] MediaWiki\Block\DatabaseBlockStore::newLoad [0.396ms] localhost: SELECT … FROM `block` … WHERE ((bt_address = '127.0.0.1' OR ((bt_ip_hex = '7F000001' AND bt_range_start IS NULL) OR (bt_range_start LIKE '7F00%' ESCAPE '`' AND bt_range_start <= '7F000001' AND bt_range_end >= '7F000001'))))

[87724] [Sat Aug 24 22:13:13 2024] [rdbms] MediaWiki\Block\DatabaseBlockStore::newLoad [0.244ms] localhost: SELECT … FROM `block` … WHERE ((((bt_ip_hex = '7F000001' AND bt_range_start IS NULL) OR (bt_range_start LIKE '7F00%' ESCAPE '`' AND bt_range_start <= '7F000001' AND bt_range_end >= '7F000001'))))
[87720] [Sat Aug 24 22:13:13 2024] [rdbms] MediaWiki\Block\DatabaseBlockStore::newLoad [0.218ms] localhost: SELECT … FROM `block` … WHERE ((((bt_ip_hex = '7F000001' AND bt_range_start IS NULL) OR (bt_range_start LIKE '7F00%' ESCAPE '`' AND bt_range_start <= '7F000001' AND bt_range_end >= '7F000001'))))

[87724] [Sat Aug 24 22:13:13 2024] [rdbms] TemporaryPasswordPrimaryAuthenticationProvider::testUserExists [0.082ms] localhost: SELECT user_id FROM `user` WHERE user_name = 'Test 1724533993545' LIMIT 1 LOCK IN SHARE MODE
[87724] [Sat Aug 24 22:13:13 2024] [rdbms] LocalPasswordPrimaryAuthenticationProvider::testUserExists [0.13ms] localhost: SELECT user_id FROM `user` WHERE user_name = 'Test 1724533993545' LIMIT 1 LOCK IN SHARE MODE
[87724] [Sat Aug 24 22:13:13 2024] [rdbms] User::load [0.12ms] localhost: SELECT actor_id,actor_user,actor_name FROM `actor` WHERE actor_name = 'Test 1724533993545' LIMIT 1 LOCK IN SHARE MODE

[87720] [Sat Aug 24 22:13:13 2024] [rdbms] TemporaryPasswordPrimaryAuthenticationProvider::testUserExists [0.13ms] localhost: SELECT user_id FROM `user` WHERE user_name = 'Test 1724533993536' LIMIT 1 LOCK IN SHARE MODE
[87720] [Sat Aug 24 22:13:13 2024] [rdbms] LocalPasswordPrimaryAuthenticationProvider::testUserExists [0.225ms] localhost: SELECT user_id FROM `user` WHERE user_name = 'Test 1724533993536' LIMIT 1 LOCK IN SHARE MODE
[87720] [Sat Aug 24 22:13:13 2024] [rdbms] User::load [0.33ms] localhost: SELECT actor_id,actor_user,actor_name FROM `actor` WHERE actor_name = 'Test 1724533993536' LIMIT 1 LOCK IN SHARE MODE

[87724] [Sat Aug 24 22:13:13 2024] [BlockManager] Create account block cache miss with key BlockCacheKey{request=#246,user=#623,replica}
[87724] [Sat Aug 24 22:13:13 2024] [BlockManager] Block cache miss with key BlockCacheKey{request=#246,user=#623,primary}
[87724] [Sat Aug 24 22:13:13 2024] [rdbms] MediaWiki\Block\DatabaseBlockStore::newLoad [0.313ms] localhost: SELECT … FROM `block` … WHERE ((bt_address = '127.0.0.1' OR ((bt_ip_hex = '7F000001' AND bt_range_start IS NULL) OR (bt_range_start LIKE '7F00%' ESCAPE '`' AND bt_range_start <= '7F000001' AND bt_range_end >= '7F000001'))))
[87724] [Sat Aug 24 22:13:13 2024] [rdbms] MediaWiki\Block\DatabaseBlockStore::newLoad [0.307ms] localhost: SELECT … FROM `block` … WHERE ((((bt_ip_hex = '7F000001' AND bt_range_start IS NULL) OR (bt_range_start LIKE '7F00%' ESCAPE '`' AND bt_range_start <= '7F000001' AND bt_range_end >= '7F000001'))))

[87720] [Sat Aug 24 22:13:13 2024] [BlockManager] Create account block cache miss with key BlockCacheKey{request=#246,user=#643,replica}

[87724] [Sat Aug 24 22:13:13 2024] [rdbms] User::load [0.13ms] localhost: SELECT actor_id,actor_user,actor_name FROM `actor` WHERE actor_name = 'Test 1724533993545' LIMIT 1 LOCK IN SHARE MODE

[87720] [Sat Aug 24 22:13:13 2024] [BlockManager] Block cache miss with key BlockCacheKey{request=#246,user=#643,primary}

[87724] [Sat Aug 24 22:13:13 2024] [rdbms] User::load [0.112ms] localhost: SELECT actor_id,actor_user,actor_name FROM `actor` WHERE actor_name = 'Test 1724533993545' LIMIT 1
[87724] [Sat Aug 24 22:13:13 2024] [authentication] AuthManager::continueAccountCreation: Primary creation passed by LocalPasswordPrimaryAuthenticationProvider
[87724] [Sat Aug 24 22:13:13 2024] [authentication] Creating user Test 1724533993545 during account creation

[87720] [Sat Aug 24 22:13:13 2024] [rdbms] MediaWiki\Block\DatabaseBlockStore::newLoad [0.467ms] localhost: SELECT … FROM `block` … WHERE ((bt_address = '127.0.0.1' OR ((bt_ip_hex = '7F000001' AND bt_range_start IS NULL) OR (bt_range_start LIKE '7F00%' ESCAPE '`' AND bt_range_start <= '7F000001' AND bt_range_end >= '7F000001'))))

[87724] [Sat Aug 24 22:13:13 2024] [rdbms] startAtomic: entering level 0 (User::addToDatabase)

[87720] [Sat Aug 24 22:13:13 2024] [rdbms] MediaWiki\Block\DatabaseBlockStore::newLoad [0.405ms] localhost: SELECT … FROM `block` … WHERE ((((bt_ip_hex = '7F000001' AND bt_range_start IS NULL) OR (bt_range_start LIKE '7F00%' ESCAPE '`' AND bt_range_start <= '7F000001' AND bt_range_end >= '7F000001'))))
[87720] [Sat Aug 24 22:13:13 2024] [rdbms] User::load [0.128ms] localhost: SELECT actor_id,actor_user,actor_name FROM `actor` WHERE actor_name = 'Test 1724533993536' LIMIT 1 LOCK IN SHARE MODE
[87720] [Sat Aug 24 22:13:13 2024] [rdbms] User::load [0.11ms] localhost: SELECT actor_id,actor_user,actor_name FROM `actor` WHERE actor_name = 'Test 1724533993536' LIMIT 1
[87720] [Sat Aug 24 22:13:13 2024] [authentication] AuthManager::continueAccountCreation: Primary creation passed by LocalPasswordPrimaryAuthenticationProvider
[87720] [Sat Aug 24 22:13:13 2024] [authentication] Creating user Test 1724533993536 during account creation
[87720] [Sat Aug 24 22:13:13 2024] [rdbms] startAtomic: entering level 0 (User::addToDatabase)
debug.log C2 api createaccount (continued)
[87724] [Sat Aug 24 22:13:13 2024] [rdbms] User::addToDatabase [3.008ms] localhost: INSERT IGNORE INTO `user` (user_name,user_password,user_newpassword,user_email,user_email_authenticated,user_real_name,user_token,user_registration,user_editcount,user_touched,user_is_temp) VALUES ('Test 1724533993545','','','',NULL,'','26e70fdd4c10ecee7c359f48a826750d','20240824211313',0,'20240824211313',0)

[87720] [Sat Aug 24 22:13:13 2024] [rdbms] User::addToDatabase [0.417ms] localhost: INSERT IGNORE INTO `user` (user_name,user_password,user_newpassword,user_email,user_email_authenticated,user_real_name,user_token,user_registration,user_editcount,user_touched,user_is_temp) VALUES ('Test 1724533993536','','','',NULL,'','166615a151d4707bbecafb8ad5ca3415','20240824211313',0,'20240824211313',0)
[87720] [Sat Aug 24 22:13:13 2024] Sat Aug 24 21:13:13 UTC 2024 krinkle-mbp6.local my_wiki Error 1213 from User::addToDatabase, Deadlock found when trying to get lock; try restarting transaction INSERT IGNORE INTO `user` (user_name,user_password,user_newpassword,user_email,user_email_authenticated,user_real_name,user_token,user_registration,user_editcount,user_touched,user_is_temp) VALUES ('Test 1724533993536','','','',NULL,'','166615a151d4707bbecafb8ad5ca3415','20240824211313',0,'20240824211313',0) localhost
#0 mediawiki/includes/libs/rdbms/database/Database.php(1152): Database->getQueryExceptionAndLog(string, int, string, string)
#1 mediawiki/includes/libs/rdbms/database/Database.php(643): Database->reportQueryError(string, int, string, string, bool)
#2 mediawiki/includes/libs/rdbms/database/Database.php(1472): Database->query(Query, string)
#3 mediawiki/includes/libs/rdbms/querybuilder/InsertQueryBuilder.php(343): Database->insert(string, array, string, array)
#4 mediawiki/includes/user/User.php(2613): InsertQueryBuilder->execute()
#5 mediawiki/includes/libs/rdbms/database/Database.php(2276): User->{closure}(DatabaseMySQL, string)
#6 mediawiki/includes/libs/rdbms/database/DBConnRef.php(127): Database->doAtomicSection(string, Closure)
#7 mediawiki/includes/libs/rdbms/database/DBConnRef.php(665): DBConnRef->__call(string, array)
#8 mediawiki/includes/user/User.php(2595): DBConnRef->doAtomicSection(string, Closure)
#9 mediawiki/includes/auth/AuthManager.php(1641): User->addToDatabase()
#10 mediawiki/includes/auth/AuthManager.php(1364): AuthManager->continueAccountCreation(array)
#11 mediawiki/includes/api/ApiAMCreateAccount.php(101): AuthManager->beginAccountCreation(User, array, string)
#12 mediawiki/includes/api/ApiMain.php(1948): ApiAMCreateAccount->execute()
#13 mediawiki/includes/api/ApiMain.php(926): ApiMain->executeAction()
#14 mediawiki/includes/api/ApiMain.php(897): ApiMain->executeActionWithErrorHandling()
#15 mediawiki/includes/api/ApiEntryPoint.php(153): ApiMain->execute()
#16 mediawiki/includes/MediaWikiEntryPoint.php(200): MediaWiki\Api\ApiEntryPoint->execute()
#17 mediawiki/api.php(44): MediaWiki\MediaWikiEntryPoint->run()
#18 {main}
[87724] [Sat Aug 24 22:13:13 2024] [rdbms] ActorStore::findActorIdInternal [0.148ms] localhost: SELECT actor_user,actor_name,actor_id FROM `actor` WHERE actor_name = 'Test 1724533993545' LIMIT 1
[87720] [Sat Aug 24 22:13:13 2024] [rdbms] cancelAtomic: canceling level 0 (User::addToDatabase)
[87720] [Sat Aug 24 22:13:13 2024] [rdbms] MWExceptionHandler::rollbackPrimaryChanges [4.467ms] localhost: ROLLBACK
[87720] [Sat Aug 24 22:13:13 2024] [rdbms] MWExceptionHandler::rollbackPrimaryChanges: acknowledged server-side transaction loss on localhost
[87720] [Sat Aug 24 22:13:13 2024] 2024-08-24 21:13:13 krinkle-mbp6.local my_wiki: [0522a8e6a3ef65bb133fa1af] /api.php?format=json DBQueryError: Error 1213: Deadlock found when trying to get lock; try restarting transaction
Function: User::addToDatabase
Query: INSERT IGNORE INTO `user` (user_name,user_password,user_newpassword,user_email,user_email_authenticated,user_real_name,user_token,user_registration,user_editcount,user_touched,user_is_temp) VALUES ('Test 1724533993536','','','',NULL,'','166615a151d4707bbecafb8ad5ca3415','20240824211313',0,'20240824211313',0)

#0 mediawiki/includes/libs/rdbms/database/Database.php(1178): Database->getQueryException(string, int, string, string)
#1 mediawiki/includes/libs/rdbms/database/Database.php(1152): Database->getQueryExceptionAndLog(string, int, string, string)
#2 mediawiki/includes/libs/rdbms/database/Database.php(643): Database->reportQueryError(string, int, string, string, bool)
#3 mediawiki/includes/libs/rdbms/database/Database.php(1472): Database->query(Query, string)
#4 mediawiki/includes/libs/rdbms/querybuilder/InsertQueryBuilder.php(343): Database->insert(string, array, string, array)
#5 mediawiki/includes/user/User.php(2613): InsertQueryBuilder->execute()
#6 mediawiki/includes/libs/rdbms/database/Database.php(2276): User->{closure}(DatabaseMySQL, string)
#7 mediawiki/includes/libs/rdbms/database/DBConnRef.php(127): Database->doAtomicSection(string, Closure)
#8 mediawiki/includes/libs/rdbms/database/DBConnRef.php(665): DBConnRef->__call(string, array)
#9 mediawiki/includes/user/User.php(2595): DBConnRef->doAtomicSection(string, Closure)
#10 mediawiki/includes/auth/AuthManager.php(1641): User->addToDatabase()
#11 mediawiki/includes/auth/AuthManager.php(1364): AuthManager->continueAccountCreation(array)
#12 mediawiki/includes/api/ApiAMCreateAccount.php(101): AuthManager->beginAccountCreation(User, array, string)
#13 mediawiki/includes/api/ApiMain.php(1948): ApiAMCreateAccount->execute()
#14 mediawiki/includes/api/ApiMain.php(926): ApiMain->executeAction()
#15 mediawiki/includes/api/ApiMain.php(897): ApiMain->executeActionWithErrorHandling()
#16 mediawiki/includes/api/ApiEntryPoint.php(153): ApiMain->execute()
#17 mediawiki/includes/MediaWikiEntryPoint.php(200): MediaWiki\Api\ApiEntryPoint->execute()
#18 mediawiki/api.php(44): MediaWiki\MediaWikiEntryPoint->run()
#19 {main}

[87720] [Sat Aug 24 22:13:13 2024] MediaWiki\MediaWikiEntryPoint::commitMainTransaction: primary transaction round committed
[87720] [Sat Aug 24 22:13:13 2024] MediaWiki\MediaWikiEntryPoint::commitMainTransaction: pre-send deferred updates completed
[87724] [Sat Aug 24 22:13:13 2024] [rdbms] ActorStore::acquireActorId [11.454ms] localhost: INSERT IGNORE INTO `actor` (actor_user,actor_name) VALUES (87,'Test 1724533993545')
[87724] [Sat Aug 24 22:13:13 2024] [rdbms] endAtomic: leaving level 0 (User::addToDatabase)
[87724] [Sat Aug 24 22:13:13 2024] [rdbms] Options\LocalUserOptionsStore::fetch [1.034ms] localhost: SELECT up_property,up_value FROM `user_properties` WHERE up_user = 87
[87720] [Sat Aug 24 22:13:13 2024] MediaWiki\MediaWikiEntryPoint::commitMainTransaction: session changes committed
[87720] [Sat Aug 24 22:13:13 2024] [rdbms] LoadBalancer::getAnyOpenConnection: found 'round' connection to #0.
[87720] [Sat Aug 24 22:13:13 2024] [rdbms] Cannot use ChronologyProtector with EmptyBagOStuff
[87720] [Sat Aug 24 22:13:13 2024] [rdbms] LBFactory::shutdown: finished ChronologyProtector shutdown
[87724] [Sat Aug 24 22:13:13 2024] [rdbms] startAtomic: entering level 0 (User::saveSettings)
[87720] [Sat Aug 24 22:13:13 2024] [rdbms] LBFactory shutdown completed
[87724] [Sat Aug 24 22:13:13 2024] [rdbms] User::saveSettings [0.195ms] localhost: UPDATE `user` SET user_name = 'Test 1724533993545',user_real_name = '',user_email = '',user_email_authenticated = NULL,user_touched = '20240824211314',user_token = 'ebd21fd69c7366be68e28de8925bccef',user_email_token = '',user_email_token_expires = NULL WHERE user_id = 87 AND user_touched = '20240824211313'
[87720] [Sat Aug 24 22:13:13 2024] [BlockManager] Block cache miss with key BlockCacheKey{request=#246,user=#354,replica}
[87724] [Sat Aug 24 22:13:13 2024] [rdbms] User::saveSettings [0.095ms] localhost: UPDATE `actor` SET actor_name = 'Test 1724533993545' WHERE actor_user = 87
[87724] [Sat Aug 24 22:13:13 2024] [rdbms] endAtomic: leaving level 0 (User::saveSettings)
[87720] [Sat Aug 24 22:13:13 2024] [rdbms] Database::beginIfImplied (MediaWiki\Block\DatabaseBlockStore::newLoad) [0.047ms] localhost: BEGIN
[87720] [Sat Aug 24 22:13:13 2024] [rdbms] MediaWiki\Block\DatabaseBlockStore::newLoad [0.335ms] localhost: SELECT … FROM `block` … WHERE ((bt_address = '127.0.0.1' OR ((bt_ip_hex = '7F000001' AND bt_range_start IS NULL) OR (bt_range_start LIKE '7F00%' ESCAPE '`' AND bt_range_start <= '7F000001' AND bt_range_end >= '7F000001'))))
[87720] [Sat Aug 24 22:13:13 2024] [rdbms] MediaWiki\MediaWikiEntryPoint::commitMainTransaction [0.06ms] localhost: COMMIT
[87724] [Sat Aug 24 22:13:13 2024] [rdbms] WatchedItemStore::addWatchBatchForUser [0.104ms] localhost: INSERT IGNORE INTO `watchlist` (wl_user,wl_namespace,wl_title,wl_notificationtimestamp) VALUES (87,2,'Test_1724533993545',NULL)
[87720] [Sat Aug 24 22:13:13 2024] MediaWiki\MediaWikiEntryPoint::commitMainTransaction: primary transaction round committed
[87720] [Sat Aug 24 22:13:13 2024] MediaWiki\MediaWikiEntryPoint::commitMainTransaction: pre-send deferred updates completed
[87720] [Sat Aug 24 22:13:13 2024] MediaWiki\MediaWikiEntryPoint::commitMainTransaction: session changes committed
[87720] [Sat Aug 24 22:13:13 2024] [rdbms] LoadBalancer::getAnyOpenConnection: found 'round' connection to #0.
[87724] [Sat Aug 24 22:13:13 2024] [rdbms] WatchedItemStore::addWatchBatchForUser [0.073ms] localhost: INSERT IGNORE INTO `watchlist` (wl_user,wl_namespace,wl_title,wl_notificationtimestamp) VALUES (87,3,'Test_1724533993545',NULL)
[87720] [Sat Aug 24 22:13:13 2024] [rdbms] LBFactory::shutdown: finished ChronologyProtector shutdown
[87720] [Sat Aug 24 22:13:13 2024] [rdbms] LBFactory shutdown completed
[87720] [Sat Aug 24 22:13:13 2024] [BlockManager] Block cache hit with key BlockCacheKey{request=#246,user=#354,replica}
[87720] [Sat Aug 24 22:13:13 2024] Output buffer flushed
[87720] [Sat Aug 24 22:13:13 2024] [rdbms] LBFactory shutdown completed
[87720] [Sat Aug 24 22:13:13 2024] Request ended normally
[87720] [Sat Aug 24 22:13:13 2024] 127.0.0.1:54257 [200]: POST /api.php?format=json
[87720] [Sat Aug 24 22:13:13 2024] [session] Saving all sessions on shutdown
[87720] [Sat Aug 24 22:13:13 2024] 127.0.0.1:54257 Closing
[87724] [Sat Aug 24 22:13:13 2024] [rdbms] LocalPasswordPrimaryAuthenticationProvider::providerChangeAuthenticationData [0.306ms] localhost: UPDATE `user` SET user_password = ':pbkdf2:…'
[87724] [Sat Aug 24 22:13:13 2024] [rdbms] ActorStore::findActorIdInternal [0.202ms] localhost: SELECT actor_user,actor_name,actor_id FROM `actor` WHERE actor_name = 'Test 1724533993545' LIMIT 1
[87724] [Sat Aug 24 22:13:13 2024] [rdbms] PageStore::getPageByNameViaLinkCache [0.404ms] localhost: SELECT page_id,page_namespace,page_title,page_is_redirect,page_is_new,page_touched,page_links_updated,page_latest,page_len,page_content_model FROM `page` WHERE page_namespace = 2 AND page_title = 'Test_1724533993545' LIMIT 1
[87724] [Sat Aug 24 22:13:13 2024] [rdbms] MediaWiki\CommentStore\CommentStore::createComment [0.153ms] localhost: SELECT comment_id FROM `comment` WHERE comment_hash = 0 AND comment_text = '' AND comment_data IS NULL LIMIT 1
[87724] [Sat Aug 24 22:13:13 2024] [rdbms] ManualLogEntry::insert [0.128ms] localhost: INSERT INTO `logging` (log_type,log_action,log_timestamp,log_actor,log_namespace,log_title,log_page,log_params,log_comment_id) VALUES ('newusers','create','20240824211313',32,2,'Test_1724533993545',0,'a:1:{s:9:\"4::userid\";i:87;}',1)
[87724] [Sat Aug 24 22:13:13 2024] [authentication] AuthManager::continueAccountCreation: Secondary creation passed by EmailNotificationSecondaryAuthenticationProvider
[87724] [Sat Aug 24 22:13:13 2024] [authentication] AuthManager::continueAccountCreation: Account creation succeeded for Test 1724533993545
[87724] [Sat Aug 24 22:13:13 2024] [authevents] createaccount API attempt
[87724] [Sat Aug 24 22:13:13 2024] [rdbms] MediaWiki\MediaWikiEntryPoint::commitMainTransaction [0.116ms] localhost: COMMIT
[87724] [Sat Aug 24 22:13:13 2024] MediaWiki\MediaWikiEntryPoint::commitMainTransaction: primary transaction round committed
[87724] [Sat Aug 24 22:13:13 2024] [DeferredUpdates] DeferredUpdates::run: started MediaWiki\Deferred\SiteStatsUpdate #637
[87724] [Sat Aug 24 22:13:13 2024] [rdbms] MediaWiki\Deferred\SiteStatsUpdate::doUpdate [0.358ms] localhost: INSERT INTO `site_stats` (ss_row_id,ss_total_edits,ss_total_pages,ss_good_articles,ss_users,ss_images) VALUES (1,0,0,0,1,0) ON DUPLICATE KEY UPDATE ss_users = GREATEST(`ss_users`+1,0)
[87724] [Sat Aug 24 22:13:13 2024] [DeferredUpdates] DeferredUpdates::run: ended MediaWiki\Deferred\SiteStatsUpdate #637, processing time: 0.0010108947753906
[87724] [Sat Aug 24 22:13:13 2024] [DeferredUpdates] DeferredUpdates::run: started MediaWiki\Deferred\MWCallableUpdate_ManualLogEntry->publish #687
[87724] [Sat Aug 24 22:13:13 2024] [rdbms] Database::beginIfImplied (UserGroupManager::getUserGroupMemberships) [0.08ms] localhost: BEGIN
[87724] [Sat Aug 24 22:13:13 2024] [rdbms] UserGroupManager::getUserGroupMemberships [0.46ms] localhost: SELECT ug_user,ug_group,ug_expiry FROM `user_groups` WHERE ug_user = 87
[87724] [Sat Aug 24 22:13:13 2024] [rdbms] MediaWiki\CommentStore\CommentStore::createComment [0.313ms] localhost: SELECT comment_id FROM `comment` WHERE comment_hash = 0 AND comment_text = '' AND comment_data IS NULL LIMIT 1
[87724] [Sat Aug 24 22:13:13 2024] [rdbms] RecentChange::save [0.251ms] localhost: INSERT INTO `recentchanges` (rc_timestamp,rc_namespace,rc_title,rc_type,rc_source,rc_minor,rc_this_oldid,rc_last_oldid,rc_bot,rc_ip,rc_patrolled,rc_new,rc_old_len,rc_new_len,rc_deleted,rc_logid,rc_log_type,rc_log_action,rc_params,rc_comment_id,rc_actor) VALUES ('20240824211313',2,'Test_1724533993545',3,'mw.log',0,0,0,0,'127.0.0.1',2,0,NULL,NULL,0,31,'newusers','create','a:1:{s:9:\"4::userid\";i:87;}',1,32)
[87724] [Sat Aug 24 22:13:13 2024] [rdbms] ManualLogEntry->publish [0.217ms] localhost: COMMIT
[87724] [Sat Aug 24 22:13:13 2024] [rdbms] WatchedItemStore::updateNotificationTimestamp [0.293ms] localhost: SELECT wl_id,wl_user FROM `watchlist` WHERE (wl_user != 87) AND wl_namespace = 2 AND wl_title = 'Test_1724533993545' AND wl_notificationtimestamp IS NULL
[87724] [Sat Aug 24 22:13:13 2024] [DeferredUpdates] DeferredUpdates::run: ended MediaWiki\Deferred\MWCallableUpdate_ManualLogEntry->publish #687, processing time: 0.004694938659668
[87724] [Sat Aug 24 22:13:13 2024] [DeferredUpdates] DeferredUpdates::run: started MediaWiki\Deferred\JobQueueEnqueueUpdate #697
[87724] [Sat Aug 24 22:13:13 2024] [DeferredUpdates] DeferredUpdates::run: ended MediaWiki\Deferred\JobQueueEnqueueUpdate #697, processing time: 8.0108642578125E-5
[87724] [Sat Aug 24 22:13:13 2024] [DeferredUpdates] DeferredUpdates::run: started MediaWiki\Deferred\JobQueueEnqueueUpdate #663
[87724] [Sat Aug 24 22:13:13 2024] [rdbms] Database::onTransactionPreCommitOrIdle [0.066ms] localhost: BEGIN
[87724] [Sat Aug 24 22:13:13 2024] [rdbms] startAtomic: entering level 0 (Database::onTransactionPreCommitOrIdle)
[87724] [Sat Aug 24 22:13:13 2024] [rdbms] JobQueueDB::doBatchPush [0.226ms] localhost: SELECT job_sha1 FROM `job` WHERE job_sha1 IN ('1tuzncxk0bhaps4flhp4x2onbku6df3','p42glrsl783wj55io7lxhmhxf0e5db5') AND job_token = ''
[87724] [Sat Aug 24 22:13:13 2024] Job with hash '1tuzncxk0bhaps4flhp4x2onbku6df3' is a duplicate.
[87724] [Sat Aug 24 22:13:13 2024] Job with hash 'p42glrsl783wj55io7lxhmhxf0e5db5' is a duplicate.
[87724] [Sat Aug 24 22:13:13 2024] [rdbms] endAtomic: leaving level 0 (Database::onTransactionPreCommitOrIdle)
[87724] [Sat Aug 24 22:13:13 2024] [rdbms] Database::onTransactionPreCommitOrIdle [0.061ms] localhost: COMMIT
[87724] [Sat Aug 24 22:13:13 2024] [DeferredUpdates] DeferredUpdates::run: ended MediaWiki\Deferred\JobQueueEnqueueUpdate #663, processing time: 0.0019738674163818
[87724] [Sat Aug 24 22:13:13 2024] MediaWiki\MediaWikiEntryPoint::commitMainTransaction: pre-send deferred updates completed
[87724] [Sat Aug 24 22:13:13 2024] MediaWiki\MediaWikiEntryPoint::commitMainTransaction: session changes committed
[87724] [Sat Aug 24 22:13:13 2024] [rdbms] LoadBalancer::getAnyOpenConnection: found 'round' connection to #0.
[87724] [Sat Aug 24 22:13:13 2024] [rdbms] Cannot use ChronologyProtector with EmptyBagOStuff
[87724] [Sat Aug 24 22:13:13 2024] [rdbms] LBFactory::shutdown: finished ChronologyProtector shutdown
[87724] [Sat Aug 24 22:13:13 2024] [rdbms] LBFactory shutdown completed
[87724] [Sat Aug 24 22:13:13 2024] [BlockManager] Block cache miss with key BlockCacheKey{request=#246,user=#354,replica}
[87724] [Sat Aug 24 22:13:13 2024] [rdbms] Database::beginIfImplied (MediaWiki\Block\DatabaseBlockStore::newLoad) [0.057ms] localhost: BEGIN
[87724] [Sat Aug 24 22:13:13 2024] [rdbms] MediaWiki\Block\DatabaseBlockStore::newLoad [0.502ms] localhost: SELECT … FROM `block` … WHERE ((bt_address = '127.0.0.1' OR ((bt_ip_hex = '7F000001' AND bt_range_start IS NULL) OR (bt_range_start LIKE '7F00%' ESCAPE '`' AND bt_range_start <= '7F000001' AND bt_range_end >= '7F000001'))))
[87724] [Sat Aug 24 22:13:13 2024] [rdbms] MediaWiki\MediaWikiEntryPoint::commitMainTransaction [0.058ms] localhost: COMMIT
[87724] [Sat Aug 24 22:13:13 2024] MediaWiki\MediaWikiEntryPoint::commitMainTransaction: primary transaction round committed
[87724] [Sat Aug 24 22:13:13 2024] MediaWiki\MediaWikiEntryPoint::commitMainTransaction: pre-send deferred updates completed
[87724] [Sat Aug 24 22:13:13 2024] MediaWiki\MediaWikiEntryPoint::commitMainTransaction: session changes committed
[87724] [Sat Aug 24 22:13:13 2024] [rdbms] LoadBalancer::getAnyOpenConnection: found 'round' connection to #0.
[87724] [Sat Aug 24 22:13:13 2024] [rdbms] LBFactory::shutdown: finished ChronologyProtector shutdown
[87724] [Sat Aug 24 22:13:13 2024] [rdbms] LBFactory shutdown completed
[87724] [Sat Aug 24 22:13:13 2024] [BlockManager] Block cache hit with key BlockCacheKey{request=#246,user=#354,replica}
[87724] [Sat Aug 24 22:13:13 2024] Output buffer flushed
[87724] [Sat Aug 24 22:13:13 2024] [rdbms] LBFactory shutdown completed
[87724] [Sat Aug 24 22:13:13 2024] Request ended normally
[87724] [Sat Aug 24 22:13:13 2024] 127.0.0.1:54258 [200]: POST /api.php?format=json
[87724] [Sat Aug 24 22:13:13 2024] [session] Saving all sessions on shutdown
[87724] [Sat Aug 24 22:13:13 2024] 127.0.0.1:54258 Closing

Reducing it down further, to what I believe to be "locking" selects or write ueries inside their main transaction:

rdbms C2 api createaccount
[87720] [rdbms] DatabaseMySQL::open
[87724] [rdbms] DatabaseMySQL::open

[87724] [rdbms] Database::beginIfImplied (DatabaseBlockStore::newLoad) BEGIN
[87720] [rdbms] Database::beginIfImplied (DatabaseBlockStore::newLoad) BEGIN

[87724] [rdbms] TemporaryPasswordPrimaryAuthenticationProvider::testUserExists SELECT user_id FROM `user` WHERE user_name = 'Debug24' LIMIT 1 LOCK IN SHARE MODE
[87724] [rdbms] LocalPasswordPrimaryAuthenticationProvider::testUserExists SELECT user_id FROM `user` WHERE user_name = 'Debug24' LIMIT 1 LOCK IN SHARE MODE
[87724] [rdbms] User::load SELECT actor_id,actor_user,actor_name FROM `actor` WHERE actor_name = 'Debug24' LIMIT 1 LOCK IN SHARE MODE

[87720] [rdbms] TemporaryPasswordPrimaryAuthenticationProvider::testUserExists SELECT user_id FROM `user` WHERE user_name = 'Debug20' LIMIT 1 LOCK IN SHARE MODE
[87720] [rdbms] LocalPasswordPrimaryAuthenticationProvider::testUserExists SELECT user_id FROM `user` WHERE user_name = 'Debug20' LIMIT 1 LOCK IN SHARE MODE
[87720] [rdbms] User::load SELECT actor_id,actor_user,actor_name FROM `actor` WHERE actor_name = 'Debug20' LIMIT 1 LOCK IN SHARE MODE

[87724] [rdbms] User::load SELECT actor_id,actor_user,actor_name FROM `actor` WHERE actor_name = 'Debug24' LIMIT 1 LOCK IN SHARE MODE

[87724] [rdbms] User::addToDatabase [3.008ms] INSERT IGNORE INTO `user` (user_name,user_password,user_newpassword,user_email,user_email_authenticated,user_real_name,user_token,user_registration,user_editcount,user_touched,user_is_temp) VALUES ('Debug24','','','',NULL,'','26e70fdd4c10ecee7c359f48a826750d','20240824211313',0,'20240824211313',0)

[87720] [rdbms] User::addToDatabase [0.417ms] INSERT IGNORE INTO `user` (user_name,user_password,user_newpassword,user_email,user_email_authenticated,user_real_name,user_token,user_registration,user_editcount,user_touched,user_is_temp) VALUES ('Debug20','','','',NULL,'','166615a151d4707bbecafb8ad5ca3415','20240824211313',0,'20240824211313',0)

I can reproduce this from two mysql client tabs in bash, separate from MediaWiki, and separate from all the (skipped) connection flags and other unrelated read queries:

  • Tab 20: mysql -u root -proot
  • Tab 24: mysql -u root -proot
SELECT VERSION();
-- 11.5.2-MariaDB
mysql
[24] use my_wiki; BEGIN;
[20] use my_wiki; BEGIN;

[24] SELECT user_id FROM `user` WHERE user_name = 'Debug24' LIMIT 1 LOCK IN SHARE MODE;
[24] SELECT user_id FROM `user` WHERE user_name = 'Debug24' LIMIT 1 LOCK IN SHARE MODE;
[24] SELECT actor_id,actor_user,actor_name FROM `actor` WHERE actor_name = 'Debug24' LIMIT 1 LOCK IN SHARE MODE;

[20] SELECT user_id FROM `user` WHERE user_name = 'Debug20' LIMIT 1 LOCK IN SHARE MODE;
[20] SELECT user_id FROM `user` WHERE user_name = 'Debug20' LIMIT 1 LOCK IN SHARE MODE;
[20] SELECT actor_id,actor_user,actor_name FROM `actor` WHERE actor_name = 'Debug20' LIMIT 1 LOCK IN SHARE MODE;

[24] SELECT actor_id,actor_user,actor_name FROM `actor` WHERE actor_name = 'Debug24' LIMIT 1 LOCK IN SHARE MODE;

[24] INSERT IGNORE INTO `user` (user_name,user_password,user_newpassword,user_email,user_email_authenticated,user_real_name,user_token,user_registration,user_editcount,user_touched,user_is_temp) VALUES ('Debug24','','','',NULL,'','26e70fdd4c10ecee7c359f48a826750d','20240824211313',0,'20240824211313',0);
[24] # Hangs until [20] disconnects due to deadlock

[20] INSERT IGNORE INTO `user` (user_name,user_password,user_newpassword,user_email,user_email_authenticated,user_real_name,user_token,user_registration,user_editcount,user_touched,user_is_temp) VALUES ('Debug20','','','',NULL,'','166615a151d4707bbecafb8ad5ca3415','20240824211313',0,'20240824211313',0);
[20] # Deadlock found when trying to get lock; try restarting transaction

Note that I'm ctrl-D'ing these to cancel the transaction, making it easy to repeat the test with the same two named Debug user names.

Reduced further to rule out the duplicate user queries and actor queries:

mysql
[24] use my_wiki; BEGIN;
[20] use my_wiki; BEGIN;

[24] SELECT user_id FROM `user` WHERE user_name = 'Debug24' LIMIT 1 LOCK IN SHARE MODE;
[20] SELECT user_id FROM `user` WHERE user_name = 'Debug20' LIMIT 1 LOCK IN SHARE MODE;

[24] INSERT IGNORE INTO `user` (user_name,user_password,user_newpassword,user_email,user_email_authenticated,user_real_name,user_token,user_registration,user_editcount,user_touched,user_is_temp) VALUES ('Debug24','','','',NULL,'','26e70fdd4c10ecee7c359f48a826750d','20240824211313',0,'20240824211313',0);
[24] # Hangs until [20] disconnects due to deadlock

[20] INSERT IGNORE INTO `user` (user_name,user_password,user_newpassword,user_email,user_email_authenticated,user_real_name,user_token,user_registration,user_editcount,user_touched,user_is_temp) VALUES ('Debug20','','','',NULL,'','166615a151d4707bbecafb8ad5ca3415','20240824211313',0,'20240824211313',0);
[20] # Deadlock found when trying to get lock; try restarting transaction

So at glance the deadlock doesn't make sense since from the business logic perspective, these should be unrelated gap locks for distinct would-be rows with a clearly different user_name.

The first thing that comes to mind, is that maybe the SELECT user_id is creating some kind of lock on the auto-increment primary key. That doesn't make it right, but I'm trying to narrow it down. Changing it to SELECT user_name, however, doesn't make a difference and reproduces the same deadlock in the same trivial manner as above.

Next, given the user table is fairly wide and has numerous indexes, I went looking for a simpler table that has just these two characteristics in common with user:

  • an "id" field that is auto-increment, and the primary key
  • a string field that makes up a unique index consisting of just that one field.
  • no other columns or indexes that relate to the ID or unique field.

I picked category:

CREATE TABLE `category` (
  `cat_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `cat_title` varbinary(255) NOT NULL,
  `cat_pages` int(11) NOT NULL DEFAULT 0,
  `cat_subcats` int(11) NOT NULL DEFAULT 0,
  `cat_files` int(11) NOT NULL DEFAULT 0,
  PRIMARY KEY (`cat_id`),
  UNIQUE KEY `cat_title` (`cat_title`),
  KEY `cat_pages` (`cat_pages`)
) ENGINE=InnoDB AUTO_INCREMENT=2 DEFAULT CHARSET=binary

And indeed the same simple steps reproduce the deadlock on that table as well:

mysql
[24] use my_wiki; BEGIN;
[20] use my_wiki; BEGIN;

[24] SELECT cat_title FROM `category` WHERE cat_title = 'T199393_krinkle_debug24' LIMIT 1 LOCK IN SHARE MODE;
[20] SELECT cat_title FROM `category` WHERE cat_title = 'T199393_krinkle_debug20' LIMIT 1 LOCK IN SHARE MODE;

[24] INSERT IGNORE INTO `category` (cat_title,cat_pages,cat_subcats,cat_files) VALUES ('T199393_krinkle_debug24',1,2,3);
[24] # Hangs until [20] disconnects due to deadlock

[20] INSERT IGNORE INTO `category` (cat_title,cat_pages,cat_subcats,cat_files) VALUES ('T199393_krinkle_debug20',4,5,6);
[20] # Deadlock found when trying to get lock; try restarting transaction

With this in hand, I went to reproduce it in production. Something about how simple this is doesn't seem right, and I sense that maybe production is configured differently such that this doesn't happen as easily there, maybe. If so, that would mean 1) the above is still worth fixing but slightly less severe, and 2) would mean T371040 is caused by a more complex scenario than this. Although it does seem likely that fixing the simple scenario would migitate the more complex scenario as well.

workstation$ ssh mwmaint
mwmaint$ screen -DR
[24] mwmaint$ sql testwiki
[20] mwmaint$ sql testwiki
[24] BEGIN;
[20] BEGIN;
[24] SELECT cat_title FROM `category` WHERE cat_title = 'T199393_krinkle_debug24' LIMIT 1 LOCK IN SHARE MODE;
[20] SELECT cat_title FROM `category` WHERE cat_title = 'T199393_krinkle_debug20' LIMIT 1 LOCK IN SHARE MODE;
[24] INSERT IGNORE INTO `category` (cat_title,cat_pages,cat_subcats,cat_files) VALUES ('T199393_krinkle_debug24',1,2,3);
[24] # Hangs until [20] disconnects due to deadlock
[20] INSERT IGNORE INTO `category` (cat_title,cat_pages,cat_subcats,cat_files) VALUES ('T199393_krinkle_debug20',4,5,6);
[20] # Deadlock found when trying to get lock; try restarting transaction
[24] # Query OK, 1 row affected (4.2 sec)
[20] ^D
Bye
[24] ^D
Bye
-- testwiki> SHOW CREATE TABLE category;

CREATE TABLE `category` (
  `cat_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `cat_title` varbinary(255) NOT NULL DEFAULT '',
  `cat_pages` int(11) NOT NULL DEFAULT 0,
  `cat_subcats` int(11) NOT NULL DEFAULT 0,
  `cat_files` int(11) NOT NULL DEFAULT 0,
  PRIMARY KEY (`cat_id`),
  UNIQUE KEY `cat_title` (`cat_title`),
  KEY `cat_pages` (`cat_pages`)
) ENGINE=InnoDB AUTO_INCREMENT=114279 DEFAULT CHARSET=binary

We can reduce it further by removing the SELECT … LOCK IN SHARE MODE. In that case, there will not be a deadlock. However, there will still be a full table lock, synchronously holding back any concurrent INSERT to the same table.

Let's step back a bit though. Why would someoneone even want to do a SELECT … LOCK IN SHARE MODE query? At face value, it looks like the intention may be to prevent the row from being changed (if it exists) and prevent anyone else from creating it if it doesn't exist. There's a few problems with that.

InnoDB supports row-level locking. […] a lock is acquired on the rows read by the query, and it will be released when the current transaction is committed.
When LOCK IN SHARE MODE is specified in a SELECT statement, MariaDB will wait until all transactions that have modified the rows are committed. Then, a write lock is acquired. All transactions can read the rows, but if they want to modify them, they have to wait until your transaction is committed.

This works for rows that exist. For rows that don't exist, it returns an empty result, and let's you (plus anyone else trying to create the same account) continue to try to make an INSERT. So this is entirely ineffective at early prevention of duplicate account creation. Username conflicts are caught only at the INSERT.

What kind of lock is created when you lock data that doesn't exist? Well, I imagine the author of this code hoped it might lock the idea of a row existing with that username, given that the user table has a unique index. Unfortunately, while I can't find a specific point in the MariaDB explaining this, it seems in practice that there is no such thing as holding a lock on an imagined point in a unique index. (Or rather, it doesn't exist when done on a table with an auto-incrementing primary key, on a database that uses statement-based replication. I haven't tried to reproduce it outside those conditions. Locking imaginary rows in a unique index might work outside of that?)

Conclusion: When creating new rows into a table, there is no point in doing a SELECT … LOCK IN SHARE MODE query. You can drop the LOCK part and achieve the same: If two requests try to do the same thing, they will both read the empty dataset from the SELECT query, and then both will try to do an INSERT. The difference is that if you remove the lock, the two queries will merely wait for each other, instead of deadlocking.

This makes sense since LOCK IN SHARE MODE is specific for when you read data that informs a future write, and that's not really the case here. There's no data being selected. It's not clear to me why this footgun exists, but it seems to me to a first approximation, that if your intent is to create data rather than to update data, you should never use LOCK IN SHARE MODE.

I guess I knew this, since we've recommend for years against LOCK IN SHARE MODE in new code due to likelihood of causing deadlocks. The part I learned today, is that there is an asymmetry between what this locks vs what it waits for. Let's read the docs again:

When LOCK IN SHARE MODE is specified in a SELECT statement, MariaDB will wait until all transactions that have modified the rows are committed. Then, a write lock is acquired. All transactions can read the rows, but if they want to modify them, they have to wait until your transaction is committed.

I would have guessed one of two things to be true:

  • Locking an imaginary row is a concept, and it will lock that concept, and wait for transactions that modify that, then lock it, and hold back any attempts to insert a matchign row.
  • Locking imagined rows is not a concept, it will wait for any transactions writing to the table, then lock the table, then hold back any attempts to insert anywhere in the table.

Instead, it seems it will wait for no-one, yet lock everything.

Consecutive Lock Mode:

In this mode, InnoDB holds a table-level lock for all bulk INSERT statements (such as … INSERT ... SELECT) until the end of the statement. For simple INSERT statements, no table-level lock is held. Instead, a lightweight mutex is used which scales significantly better. This is the default setting.

Interleaved Lock Mode:
In this mode, InnoDB does not hold any table-level locks at all. This is the fastest and most scalable mode, but is not safe for statement-based replication.

So this says simple INSERT statement should not lock each other. So let's get rid of the LOCK IN SHARE MODE and try the same inserts without this.

mysql
[24] use my_wiki; BEGIN;
[20] use my_wiki; BEGIN;

[24] INSERT IGNORE INTO `category` (cat_title,cat_pages,cat_subcats,cat_files) VALUES ('T199393_krinkle_debug24',1,2,3);
[24] # Query OK, 1 row affected (0.001 sec)

[20] INSERT IGNORE INTO `category` (cat_title,cat_pages,cat_subcats,cat_files) VALUES ('T199393_krinkle_debug20',4,5,6);
[20] # Hangs until [24] commits or rollback due to full table lock.
[20] # If [24] rolls back, this "Query OK, 1 row affected"
[20] # If [20] commits, this will "Query OK, 0 rows affected, 1 warning"

Good news: No deadlock. So definitely an improvement over the status quo.

Bad news: Inserts still cause a synchronous blockage, with a full table lock, seemingly allowing for no concurrent inserts.

I'm not willing to believe this is merely do to having an auto increment on the table. We have auto increments on tables like revision and recentchanges as well which are super write heavy and I don't believe we're synchronously blocking those inserts in production, with each blocked on completion of the previous transaction.

Assuming that's true, that leaves us with the unique index as possible factor that might cause this for the user table.

I went looking for a table with an auto-increment primary key, and no unique index. And found: externalinks.

SHOW CREATE TABLE externallinks;

CREATE TABLE `externallinks` (
  `el_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `el_from` int(10) unsigned NOT NULL DEFAULT 0,
  `el_to_domain_index` varbinary(255) NOT NULL DEFAULT '',
  `el_to_path` blob DEFAULT NULL,
  PRIMARY KEY (`el_id`),
  KEY `el_from` (`el_from`),
  KEY `el_to_domain_index_to_path` (`el_to_domain_index`,`el_to_path`(60))
) ENGINE=InnoDB AUTO_INCREMENT=3 DEFAULT CHARSET=binary
mysql
[24] use my_wiki; BEGIN;
[20] use my_wiki; BEGIN;

[24] INSERT IGNORE INTO `externallinks` (el_from,el_to_domain_index,el_to_path) VALUES (1,'http://test.example','/T199393_krinkle_debug24');
[24] # Query OK, 1 row affected (0.001 sec)

[20] INSERT IGNORE INTO `externallinks` (el_from,el_to_domain_index,el_to_path) VALUES (2,'http://test.example','/T199393_krinkle_debug20');
[20] # Query OK, 1 row affected (0.001 sec)

[20] INSERT IGNORE INTO `externallinks` (el_from,el_to_domain_index,el_to_path) VALUES (1,'http://test.example','/T199393_krinkle_debug24');
[20] # Query OK, 1 row affected (0.001 sec)

No delay, even when inserting the exact same row twice. Alright, that's more like it. So...

Conclusion: In a table with a unique index and an auto-increment, on a database cluster that uses statement-based replication, inserts are inherently slow by default with no recourse? I hope I'm wrong, but, it does appear to be true in even the most simple of circumstances, and reproducible both on the default MariaDB configuration, and what we have at WMF in production.

This suggests that any 2 or more account creations that overlap on the same wiki, cause the first to stall, and the others to hard fail with a deadlock, and once overlapping creations have finished failing, the first will succeed after long delay. The delay isn't due to the insert being slow or whatever, but it's because the concurrent connections have to die (and they will, because they will all get a deadlock) before the first one can get its result.

Action items:

  • Ditch the SELECT ...LOCK. If it was doing something useful (git blame to find out?), then figure out a different solution for it.
  • If we really need the transaction to fail early, we could do a named non-blocking lock via GET_LOCK() with a string that includes the user name, that way we'll know right away if another request is doing the same thing and fail early. Such locks incurr extra roundtrips however, thus slowing down all account creations. Given that inserts on the user table inherently locks the entire table, the last thing we need is more database roundtrips during account creation, as that would pronlong those transactions and thus increase latency in ways that scale badly. Besides, today this lock isn't actually preventing two requests from trying to create an account with the same name, as proven by the deadlock during the INSERT (which is half-way those requests).
  • Make the transaction as short as we can by making sure the insert is as close to the end of the request as possible.

To the last point, I didn't comment on it before, but the above analysis in my last three comments, shows a ton of queries. It seems very unlikely that we need so many repeated queries on the user table and ipblocks table over and over again. There seems to be something wrong there. Both in terms of batching oppertunity as well as lacking in-request caching for repeat access (granted, I used Session/MainStash/MicroStash=SqlBag; MainCache=EmptyBag; so maybe it's using memcached in practice, but that's still a network roundtrip for repeat access).

I'm actually starting to doubt the entire notion of why PermissionManager and BlockManager even support the concept of verifying permissions and blocks on the primary DB. Is it really worth all primary-db load and transaction prolonging to enforce changes in those by a few milliseconds sooner? Did we introduce that on purpose? Or was it historically all on the primary and did we gradually introducer less rigor/quicker versions that we never pulled the trigger on standardising around? (TODO: git-blame on this)

Running it on Quickstart...

The page (on meta) does not exist. Did you want to link to mediawiki? https://www.mediawiki.org/wiki/Quickstart

Let's step back a bit though. Why would someoneone even want to do a SELECT … LOCK IN SHARE MODE query? At face value, it looks like the intention may be to prevent the row from being changed (if it exists) and prevent anyone else from creating it if it doesn't exist. There's a few problems with that.

It can be useful for transactions with a write query that depends on prior read query of the set of rows matching a predicate. It should be avoided if the matching rows will be updated. In practice, it should only be used when:
a) The predicate specifies a small set of possible value n-tuples for all columns of a single unique key, where a row usually exists for each tuple. For example, a page_id IN (x,y,z) predicate for the SELECT.
b) The predicate specifies a small set of possible value n-tuples for the first n columns of an index (unique or not), there are many different n-tuples values that correspond to existing rows, the number of existing rows that correspond to any given n-tuple is not huge, and concurrent writes are expected to rarely involve the same n-tuples. For example a pl_from=X predicate corresponds to the prefix (pl_from) of the (pl_from, pl_target_id) index. This could be used to compute some per-page "link count" row in another table. Wrapper locks can easily be used to lower contention, since the n-tuple is foreign key, e.g. GET_LOCK("somelock-X") and "SELECT FROM page WHERE page_id=X FOR UPDATE".

The second case does not come up very often. In the first case, callers could probably also just use FOR UPDATE, to keep things simple and reduce the chance of "S/X gap => IX" and "S => X" lock escalation foot-guns. We have lots of extensions adding lots of writes to transactions, so it's often tricky to say "this transaction probably won't try to come back and write to this row"...I suppose certain components with strong ownership of specific tables could be rely on such logic (e.g. NameTableStore).

What kind of lock is created when you lock data that doesn't exist? Well, I imagine the author of this code hoped it might lock the idea of a row existing with that username, given that the user table has a unique index. Unfortunately, while I can't find a specific point in the MariaDB explaining this, it seems in practice that there is no such thing as holding a lock on an imagined point in a unique index. (Or rather, it doesn't exist when done on a table with an auto-incrementing primary key, on a database that uses statement-based replication. I haven't tried to reproduce it outside those conditions. Locking imaginary rows in a unique index might work outside of that?)

Whether replication is statement or binlog doesn't directly matter here. The lock that is created in this case is a gap lock affixed to the first record in the index after the (index coordinate) position where no record exists (the dummy supremum "record" is used if none). The lock prohibits insertion into the (index coordinate) gap before that record. Note that both committed and uncommitted records will be made use of to affix gap locks.

Conclusion: When creating new rows into a table, there is no point in doing a SELECT … LOCK IN SHARE MODE query. You can drop the LOCK part and achieve the same: If two requests try to do the same thing, they will both read the empty dataset from the SELECT query, and then both will try to do an INSERT. The difference is that if you remove the lock, the two queries will merely wait for each other, instead of deadlocking.

Yep.

I would have guessed one of two things to be true:

  • Locking an imaginary row is a concept, and it will lock that concept, and wait for transactions that modify that, then lock it, and hold back any attempts to insert a matchign row.
  • Locking imagined rows is not a concept, it will wait for any transactions writing to the table, then lock the table, then hold back any attempts to insert anywhere in the table.

Instead, it seems it will wait for no-one, yet lock everything.

Yes, intuition leans towards "locking the predicate" though every DB I'm aware of uses some kind of gap locking as a practical but overzealous "predicate lock". Gap locks in mariadb could at least be less contentious if they could affix to their own entries in system predicate trees rather than only being able to affix to records that happen to already be in the index. This would have complexity and overhead trade-offs though.

mysql
[24] use my_wiki; BEGIN;
[20] use my_wiki; BEGIN;

[24] INSERT IGNORE INTO `category` (cat_title,cat_pages,cat_subcats,cat_files) VALUES ('T199393_krinkle_debug24',1,2,3);
[24] # Query OK, 1 row affected (0.001 sec)

[20] INSERT IGNORE INTO `category` (cat_title,cat_pages,cat_subcats,cat_files) VALUES ('T199393_krinkle_debug20',4,5,6);
[20] # Hangs until [24] commits or rollback due to full table lock.
[20] # If [24] rolls back, this "Query OK, 1 row affected"
[20] # If [20] commits, this will "Query OK, 0 rows affected, 1 warning"

Good news: No deadlock. So definitely an improvement over the status quo.

Bad news: Inserts still cause a synchronous blockage, with a full table lock, seemingly allowing for no concurrent inserts.

I am not seeing this behavior. Both insert without blocking.

Conclusion: In a table with a unique index and an auto-increment, on a database cluster that uses statement-based replication, inserts are inherently slow by default with no recourse? I hope I'm wrong, but, it does appear to be true in even the most simple of circumstances, and reproducible both on the default MariaDB configuration, and what we have at WMF in production.

Seems like something got mixed up in your setup. I guess if you set innodb_autoinc_lock_mode = 0 (we use 1) and have a table with autoincrement column C1, then INSERTs that omit C1 values, that block due to a conflicting INSERT or "SELECT ... LOCK IN SHARE MODE" would, in turn, block other INSERTs that omit C1. This would be due to the autoinc lock being held during the whole INSERT statement rather than briefly before it.

This suggests that any 2 or more account creations that overlap on the same wiki, cause the first to stall, and the others to hard fail with a deadlock, and once overlapping creations have finished failing, the first will succeed after long delay. The delay isn't due to the insert being slow or whatever, but it's because the concurrent connections have to die (and they will, because they will all get a deadlock) before the first one can get its result.

If you are using auto-generated names like "Debug<sequence #>", you will experience this. If the user table is nearly empty and a bunch of user creation requests come in it will happen to. In both cases, the names pidgeonhole to the same gaps. In other cases, concurrency is possible.

Action items:

  • Ditch the SELECT ...LOCK. If it was doing something useful (git blame to find out?), then figure out a different solution for it.
  • If we really need the transaction to fail early, we could do a named non-blocking lock via GET_LOCK() with a string that includes the user name, that way we'll know right away if another request is doing the same thing and fail early. Such locks incurr extra roundtrips however, thus slowing down all account creations. Given that inserts on the user table inherently locks the entire table, the last thing we need is more database roundtrips during account creation, as that would pronlong those transactions and thus increase latency in ways that scale badly. Besides, today this lock isn't actually preventing two requests from trying to create an account with the same name, as proven by the deadlock during the INSERT (which is half-way those requests).

Maybe the SELECT was to avoid large autoincrement gaps or something or binlog spam for the replicas (e.g. when the conflicted was already committed when the transaction doing the INSERT started). Are there a lot of ways that these user insertion code paths can triggered by things like account autocreation? If that's not a problem, then using the INSERT IGNORE alone seems fine.

If we need to avoid doing INSERT when a row is already there, then it's trickier. A regular SELECT might not see rows that are pending or were committed after the start of the transaction (REPEATABLE-READ), nor would it stop the row it saw from being changed (we don't really DELETE users but a change to user_name is possible).

  • Make the transaction as short as we can by making sure the insert is as close to the end of the request as possible.

Yeah, there are some duplicate looking queries. Improving that might require wrangling through different compontent boundaries.

I'm actually starting to doubt the entire notion of why PermissionManager and BlockManager even support the concept of verifying permissions and blocks on the primary DB. Is it really worth all primary-db load and transaction prolonging to enforce changes in those by a few milliseconds sooner? Did we introduce that on purpose? Or was it historically all on the primary and did we gradually introducer less rigor/quicker versions that we never pulled the trigger on standardising around? (TODO: git-blame on this)

For observable state-changing requests (e.g. "write requests") checking permissions on the primary seems reasonable, since it is about reads that determine writes, using DB_PRIMARY is the usual minimum (with the adding of FOR UPDATE being more correct). I suppose the permission stuff has gotten more complicated and heavy weight over the years, though.

I previously spent a bunch of time following the trail of Anomie regarding InnoDB deadlock. Beside some analyze I did above, I have added to our test runner and CI job the ability for InnoDB to write the deadlock to the MySQL error log which are collected by Jenkins (T342088). Then, I never looked back.

I went on the Jenkins server (contint.wikimedia.org) went for a grep through all Selenium builds and there was only one match:

$ sudo -u jenkins grep deadlock /srv/jenkins/builds/*selenium*/*/archive/log/my*
quibble-vendor-mysql-php74-selenium/36697/archive/log/mysql-error.log:2024-09-06 14:13:25 1067 [Note] InnoDB: Transactions deadlock detected, dumping detailed information.

That is https://integration.wikimedia.org/ci/job/quibble-vendor-mysql-php74-selenium/36697/ and here is the full log:

12024-09-06 14:09:41 0 [Note] Starting MariaDB 10.3.39-MariaDB-0+deb10u2 source revision ca001cf2048f0152689e1895e2dc15486dd0b1af as process 2833
22024-09-06 14:09:42 0 [Note] InnoDB: Using Linux native AIO
32024-09-06 14:09:42 0 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
42024-09-06 14:09:42 0 [Note] InnoDB: Uses event mutexes
52024-09-06 14:09:42 0 [Note] InnoDB: Compressed tables use zlib 1.2.11
62024-09-06 14:09:42 0 [Note] InnoDB: Number of pools: 1
72024-09-06 14:09:42 0 [Note] InnoDB: Using SSE2 crc32 instructions
82024-09-06 14:09:42 0 [Note] InnoDB: Initializing buffer pool, total size = 128M, instances = 1, chunk size = 128M
92024-09-06 14:09:42 0 [Note] InnoDB: Completed initialization of buffer pool
102024-09-06 14:09:42 0 [Note] InnoDB: If the mysqld execution user is authorized, page cleaner thread priority can be changed. See the man page of setpriority().
112024-09-06 14:09:42 0 [Note] InnoDB: 128 out of 128 rollback segments are active.
122024-09-06 14:09:42 0 [Note] InnoDB: Creating shared tablespace for temporary tables
132024-09-06 14:09:42 0 [Note] InnoDB: Setting file './ibtmp1' size to 12 MB. Physically writing the file full; Please wait ...
142024-09-06 14:09:42 0 [Note] InnoDB: File './ibtmp1' size is now 12 MB.
152024-09-06 14:09:42 0 [Note] InnoDB: Waiting for purge to start
162024-09-06 14:09:42 0 [Note] InnoDB: 10.3.39 started; log sequence number 1625419; transaction id 20
172024-09-06 14:09:42 0 [Note] InnoDB: Loading buffer pool(s) from /workspace/db/quibble-mysql-w07k9htb/ib_buffer_pool
182024-09-06 14:09:42 0 [Note] Plugin 'FEEDBACK' is disabled.
192024-09-06 14:09:42 0 [Note] InnoDB: Buffer pool(s) load completed at 240906 14:09:42
202024-09-06 14:09:42 0 [Note] Reading of all Master_info entries succeeded
212024-09-06 14:09:42 0 [Note] Added new Master_info '' to hash table
222024-09-06 14:09:42 0 [Note] /usr/sbin/mysqld: ready for connections.
23Version: '10.3.39-MariaDB-0+deb10u2' socket: '/workspace/db/quibble-mysql-w07k9htb/socket' port: 0 Debian 10
242024-09-06 14:13:25 1067 [Note] InnoDB: Transactions deadlock detected, dumping detailed information.
252024-09-06 14:13:25 1067 [Note] InnoDB:
26*** (1) TRANSACTION:
27
28TRANSACTION 4339, ACTIVE 0 sec inserting
29mysql tables in use 1, locked 1
30LOCK WAIT 3 lock struct(s), heap size 1128, 2 row lock(s)
31MySQL thread id 1068, OS thread handle 140553551611648, query id 22452 localhost wikiuser Update
32INSERT /* MediaWiki\CheckUser\Jobs\UpdateUserCentralIndexJob::run */ INTO `cuci_user` (ciu_timestamp,ciu_ciwm_id,ciu_central_id) VALUES ('20240906141313',1,14) ON DUPLICATE KEY UPDATE ciu_timestamp = '20240906141313'
332024-09-06 14:13:25 1067 [Note] InnoDB: *** (1) WAITING FOR THIS LOCK TO BE GRANTED:
34
35RECORD LOCKS space id 140 page no 3 n bits 80 index PRIMARY of table `wikidb`.`cuci_user` trx id 4339 lock_mode X insert intention waiting
36Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0
37 0: len 8; hex 73757072656d756d; asc supremum;;
38
392024-09-06 14:13:25 1067 [Note] InnoDB: *** (2) TRANSACTION:
40
41TRANSACTION 4340, ACTIVE 0 sec inserting
42mysql tables in use 1, locked 1
433 lock struct(s), heap size 1128, 2 row lock(s)
44MySQL thread id 1067, OS thread handle 140553551918848, query id 22453 localhost wikiuser Update
45INSERT /* MediaWiki\CheckUser\Jobs\UpdateUserCentralIndexJob::run */ INTO `cuci_user` (ciu_timestamp,ciu_ciwm_id,ciu_central_id) VALUES ('20240906141313',1,14) ON DUPLICATE KEY UPDATE ciu_timestamp = '20240906141313'
462024-09-06 14:13:25 1067 [Note] InnoDB: *** (2) HOLDS THE LOCK(S):
47
48RECORD LOCKS space id 140 page no 3 n bits 80 index PRIMARY of table `wikidb`.`cuci_user` trx id 4340 lock_mode X
49Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0
50 0: len 8; hex 73757072656d756d; asc supremum;;
51
522024-09-06 14:13:25 1067 [Note] InnoDB: *** (2) WAITING FOR THIS LOCK TO BE GRANTED:
53
54RECORD LOCKS space id 140 page no 3 n bits 80 index PRIMARY of table `wikidb`.`cuci_user` trx id 4340 lock_mode X insert intention waiting
55Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0
56 0: len 8; hex 73757072656d756d; asc supremum;;
57
582024-09-06 14:13:25 1067 [Note] InnoDB: *** WE ROLL BACK TRANSACTION (2)
59

[…]
Basically, I would get rid of all the locking reads. I don't understand what they are there for. Someone (@aaron?) had better explain it to me before I start randomly deleting READ_LOCKING options.

It was added in 11 July 2015 as part of T104615: Some account creations causing exceptions in https://gerrit.wikimedia.org/r/c/mediawiki/core/+/224201, which says:

  • This helps […] multiple account creation attempts [when] the slave selected was lagged.
  • The locking better handles concurrent [account creation], especially with CentralAuth […]. This [name contention] is low enough […] to avoid gap locking issues. This will need to be confirmed with low error log rates to be sure.

It's not obvious how either of these would resolve the exceptions reported in that task, if indeed it does.

EDIT: The above patch from 2015 added READ_LOCKING to this query, which, today translates to the problematic "LOCK IN SHARE MODE". However, at the time of the above patch it actually got manually translated to FOR UPDATE.

This is not a simple mistake, because prior to 2014, the READ_LOCKING constant did indeed stand for "FOR UPDATE", which was changed in https://gerrit.wikimedia.org/r/c/mediawiki/core/+/141609. Guess why? It was changed in order to... "Fix User::addToDatabase: hit a key conflict" by retroactively changing its meaning from "FOR UPDATE" to "LOCK IN SHARE MODE" (T68185).

EDIT 2: The above 2015 patch was changed 5 days later on 16 July 2015 at https://gerrit.wikimedia.org/r/c/mediawiki/core/+/224647 from "FOR UPDATE" to "LOCK IN SHARE MODE".

Tweaked User::READ_LOCKING to use LOCK IN SHARE MODE

This better alignes with IDBAccessObject docs and still locks out INSERTs

Change #1083959 had a related patch set uploaded (by Krinkle; author: Krinkle):

[mediawiki/core@master] auth: Use READ_LATEST intead of READ_LOCKING for account creation checks

https://gerrit.wikimedia.org/r/1083959

Change #1083959 merged by jenkins-bot:

[mediawiki/core@master] auth: Use READ_LATEST intead of READ_LOCKING for account creation checks

https://gerrit.wikimedia.org/r/1083959

Assuming this is fixed.

Assuming this is fixed.

Given this bug has been blocking much of the parallelism efforts over the last few years, it'd be nice to have a confirmation. @edwardspec wrote a reproduction case, that seems sufficient to confirm it:

Testcase to reproduce

npm install
API=http://somewiki/w/api.php node ReproduceBug.js

Assuming this is fixed.

Given this bug has been blocking much of the parallelism efforts over the last few years, it'd be nice to have a confirmation.

I did not use the testcase above but I have at least two extensions where I had patches for selenium blocked because of that issues and these are fine now \o/. ( Anyone feel free to run that script, but IMO we're good here :-). )

If that got verified that is fine to me! Thank you for the fix and its approval!