Page MenuHomePhabricator

Unable to delete file pages on commons: MWException/LocalFileLockError: "Could not acquire lock"
Open, MediumPublic

Description

Unable to delete pages on commons:

April 4:

April 18:

May 7:

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

New error when deleting (related to this?):

API request failed (backend-fail-internal): An unknown error occurred in storage backend "local-multiwrite".

See also https://commons.wikimedia.org/wiki/Category:Deletion_bug and more complains on commons admin noticeboard by a number of admins.

Well thats a record.. 32 images in a row that I could not delete.

API request failed (internal_api_error_Wikibase\Client\Usage\UsageTrackerException): [V0IL1ApAMEEAAFnDRjEAAAAT] Exception Caught: Got explicit BEGIN from Wikibase\Client\Usage\Sql\EntityUsageTable::removeUsages while atomic section(s) FileDeleteForm::doDelete are open. <i>at Sun, 22 May 2016 19:43:16 GMT</i> <u>served by mw1230</u>

API request failed (internal_api_error_Wikibase\Client\Usage\UsageTrackerException): [V0IL1ApAEHQAACKc3hQAAACV] Exception Caught: Got explicit BEGIN from Wikibase\Client\Usage\Sql\EntityUsageTable::removeUsages while atomic section(s) FileDeleteForm::doDelete are open. <i>at Sun, 22 May 2016 19:43:16 GMT</i> <u>served by mw1136</u>

API request failed (internal_api_error_Wikibase\Client\Usage\UsageTrackerException): [V0IL1ApAANEAAAYvSyYAAADA] Exception Caught: Got explicit BEGIN from Wikibase\Client\Usage\Sql\EntityUsageTable::removeUsages while atomic section(s) FileDeleteForm::doDelete are open. <i>at Sun, 22 May 2016 19:43:16 GMT</i> <u>served by mw1206</u>

API request failed (internal_api_error_Wikibase\Client\Usage\UsageTrackerException): [V0IL1ApAEHcAAHlQ4xAAAAAF] Exception Caught: Got explicit BEGIN from Wikibase\Client\Usage\Sql\EntityUsageTable::removeUsages while atomic section(s) FileDeleteForm::doDelete are open. <i>at Sun, 22 May 2016 19:43:17 GMT</i> <u>served by mw1139</u>

API request failed (internal_api_error_Wikibase\Client\Usage\UsageTrackerException): [V0IL1ApAIEEAABnEJSIAAAAV] Exception Caught: Got explicit BEGIN from Wikibase\Client\Usage\Sql\EntityUsageTable::removeUsages while atomic section(s) FileDeleteForm::doDelete are open. <i>at Sun, 22 May 2016 19:43:16 GMT</i> <u>served by mw1195</u>

API request failed (internal_api_error_Wikibase\Client\Usage\UsageTrackerException): [V0IL1QpAMEUAAFykz9YAAADA] Exception Caught: Got explicit BEGIN from Wikibase\Client\Usage\Sql\EntityUsageTable::removeUsages while atomic section(s) FileDeleteForm::doDelete are open. <i>at Sun, 22 May 2016 19:43:17 GMT</i> <u>served by mw1234</u>

API request failed (internal_api_error_Wikibase\Client\Usage\UsageTrackerException): [V0IL1QpAMDsAAKhiu3YAAACV] Exception Caught: Got explicit BEGIN from Wikibase\Client\Usage\Sql\EntityUsageTable::removeUsages while atomic section(s) FileDeleteForm::doDelete are open. <i>at Sun, 22 May 2016 19:43:17 GMT</i> <u>served by mw1224</u>

API request failed (internal_api_error_Wikibase\Client\Usage\UsageTrackerException): [V0IL1QpAMEUAAFykz9kAAADU] Exception Caught: Got explicit BEGIN from Wikibase\Client\Usage\Sql\EntityUsageTable::removeUsages while atomic section(s) FileDeleteForm::doDelete are open. <i>at Sun, 22 May 2016 19:43:18 GMT</i> <u>served by mw1234</u>

API request failed (internal_api_error_Wikibase\Client\Usage\UsageTrackerException): [V0IL1QpAANAAAGov1S0AAAAC] Exception Caught: Got explicit BEGIN from Wikibase\Client\Usage\Sql\EntityUsageTable::removeUsages while atomic section(s) FileDeleteForm::doDelete are open. <i>at Sun, 22 May 2016 19:43:17 GMT</i> <u>served by mw1205</u>

API request failed (internal_api_error_Wikibase\Client\Usage\UsageTrackerException): [V0IL1QpAIEEAABnEJYAAAAAY] Exception Caught: Got explicit BEGIN from Wikibase\Client\Usage\Sql\EntityUsageTable::removeUsages while atomic section(s) FileDeleteForm::doDelete are open. <i>at Sun, 22 May 2016 19:43:18 GMT</i> <u>served by mw1195</u>

API request failed (internal_api_error_Wikibase\Client\Usage\UsageTrackerException): [V0IL1QpAEGMAABLuPmYAAAAI] Exception Caught: Got explicit BEGIN from Wikibase\Client\Usage\Sql\EntityUsageTable::removeUsages while atomic section(s) FileDeleteForm::doDelete are open. <i>at Sun, 22 May 2016 19:43:17 GMT</i> <u>served by mw1119</u>

API request failed (internal_api_error_Wikibase\Client\Usage\UsageTrackerException): [V0IL1QpAEG8AAB9hgo0AAAAG] Exception Caught: Got explicit BEGIN from Wikibase\Client\Usage\Sql\EntityUsageTable::removeUsages while atomic section(s) FileDeleteForm::doDelete are open. <i>at Sun, 22 May 2016 19:43:18 GMT</i> <u>served by mw1131</u>

API request failed (internal_api_error_Wikibase\Client\Usage\UsageTrackerException): [V0IL1QpAIEAAAGLHi-8AAABQ] Exception Caught: Got explicit BEGIN from Wikibase\Client\Usage\Sql\EntityUsageTable::removeUsages while atomic section(s) FileDeleteForm::doDelete are open. <i>at Sun, 22 May 2016 19:43:18 GMT</i> <u>served by mw1194</u>

API request failed (internal_api_error_Wikibase\Client\Usage\UsageTrackerException): [V0IL1gpAEGQAABOAL3YAAAAL] Exception Caught: Got explicit BEGIN from Wikibase\Client\Usage\Sql\EntityUsageTable::removeUsages while atomic section(s) FileDeleteForm::doDelete are open. <i>at Sun, 22 May 2016 19:43:18 GMT</i> <u>served by mw1120</u>

API request failed (internal_api_error_Wikibase\Client\Usage\UsageTrackerException): [V0IL1gpAEF4AAAv7McsAAAAQ] Exception Caught: Got explicit BEGIN from Wikibase\Client\Usage\Sql\EntityUsageTable::removeUsages while atomic section(s) FileDeleteForm::doDelete are open. <i>at Sun, 22 May 2016 19:43:18 GMT</i> <u>served by mw1114</u>

API request failed (internal_api_error_Wikibase\Client\Usage\UsageTrackerException): [V0IL1gpAEHgAAF6tlSQAAAAV] Exception Caught: Got explicit BEGIN from Wikibase\Client\Usage\Sql\EntityUsageTable::removeUsages while atomic section(s) FileDeleteForm::doDelete are open. <i>at Sun, 22 May 2016 19:43:18 GMT</i> <u>served by mw1140</u>

API request failed (internal_api_error_Wikibase\Client\Usage\UsageTrackerException): [V0IL1gpAEF4AAAv7MdIAAAAV] Exception Caught: Got explicit BEGIN from Wikibase\Client\Usage\Sql\EntityUsageTable::removeUsages while atomic section(s) FileDeleteForm::doDelete are open. <i>at Sun, 22 May 2016 19:43:18 GMT</i> <u>served by mw1114</u>

API request failed (internal_api_error_Wikibase\Client\Usage\UsageTrackerException): [V0IL1gpAEHkAADYKumUAAAAD] Exception Caught: Got explicit BEGIN from Wikibase\Client\Usage\Sql\EntityUsageTable::removeUsages while atomic section(s) FileDeleteForm::doDelete are open. <i>at Sun, 22 May 2016 19:43:18 GMT</i> <u>served by mw1141</u>

API request failed (internal_api_error_Wikibase\Client\Usage\UsageTrackerException): [V0IL1gpAMD0AALnTKoAAAAAD] Exception Caught: Got explicit BEGIN from Wikibase\Client\Usage\Sql\EntityUsageTable::removeUsages while atomic section(s) FileDeleteForm::doDelete are open. <i>at Sun, 22 May 2016 19:43:18 GMT</i> <u>served by mw1226</u>

API request failed (internal_api_error_Wikibase\Client\Usage\UsageTrackerException): [V0IL1gpAMEQAAAJXDVcAAACT] Exception Caught: Got explicit BEGIN from Wikibase\Client\Usage\Sql\EntityUsageTable::removeUsages while atomic section(s) FileDeleteForm::doDelete are open. <i>at Sun, 22 May 2016 19:43:18 GMT</i> <u>served by mw1233</u>

API request failed (internal_api_error_Wikibase\Client\Usage\UsageTrackerException): [V0IL1gpAAM8AADKQVHIAAACR] Exception Caught: Got explicit BEGIN from Wikibase\Client\Usage\Sql\EntityUsageTable::removeUsages while atomic section(s) FileDeleteForm::doDelete are open. <i>at Sun, 22 May 2016 19:43:19 GMT</i> <u>served by mw1204</u>

API request failed (internal_api_error_Wikibase\Client\Usage\UsageTrackerException): [V0IL1gpAID8AACY0pOgAAAAA] Exception Caught: Got explicit BEGIN from Wikibase\Client\Usage\Sql\EntityUsageTable::removeUsages while atomic section(s) FileDeleteForm::doDelete are open. <i>at Sun, 22 May 2016 19:43:19 GMT</i> <u>served by mw1193</u>

API request failed (internal_api_error_Wikibase\Client\Usage\UsageTrackerException): [V0IL1gpAMCEAAE-7T3wAAACW] Exception Caught: Got explicit BEGIN from Wikibase\Client\Usage\Sql\EntityUsageTable::removeUsages while atomic section(s) FileDeleteForm::doDelete are open. <i>at Sun, 22 May 2016 19:43:18 GMT</i> <u>served by mw1201</u>

API request failed (internal_api_error_Wikibase\Client\Usage\UsageTrackerException): [V0IL1gpAMEAAAAaUlusAAAAE] Exception Caught: Got explicit BEGIN from Wikibase\Client\Usage\Sql\EntityUsageTable::removeUsages while atomic section(s) FileDeleteForm::doDelete are open. <i>at Sun, 22 May 2016 19:43:19 GMT</i> <u>served by mw1229</u>

API request failed (internal_api_error_Wikibase\Client\Usage\UsageTrackerException): [V0IL1wpAMEMAAEsnRzkAAABW] Exception Caught: Got explicit BEGIN from Wikibase\Client\Usage\Sql\EntityUsageTable::removeUsages while atomic section(s) FileDeleteForm::doDelete are open. <i>at Sun, 22 May 2016 19:43:19 GMT</i> <u>served by mw1232</u>

API request failed (internal_api_error_Wikibase\Client\Usage\UsageTrackerException): [V0IL1wpAANAAAGov1gkAAAAE] Exception Caught: Got explicit BEGIN from Wikibase\Client\Usage\Sql\EntityUsageTable::removeUsages while atomic section(s) FileDeleteForm::doDelete are open. <i>at Sun, 22 May 2016 19:43:19 GMT</i> <u>served by mw1205</u>

API request failed (internal_api_error_Wikibase\Client\Usage\UsageTrackerException): [V0IL1wpAANIAADZpB2kAAACK] Exception Caught: Got explicit BEGIN from Wikibase\Client\Usage\Sql\EntityUsageTable::removeUsages while atomic section(s) FileDeleteForm::doDelete are open. <i>at Sun, 22 May 2016 19:43:19 GMT</i> <u>served by mw1207</u>

API request failed (internal_api_error_Wikibase\Client\Usage\UsageTrackerException): [V0IL1wpAMEYAALDLiHIAAAAH] Exception Caught: Got explicit BEGIN from Wikibase\Client\Usage\Sql\EntityUsageTable::removeUsages while atomic section(s) FileDeleteForm::doDelete are open. <i>at Sun, 22 May 2016 19:43:19 GMT</i> <u>served by mw1235</u>

API request failed (internal_api_error_Wikibase\Client\Usage\UsageTrackerException): [V0IL1wpAMCEAAE-7T@kAAACM] Exception Caught: Got explicit BEGIN from Wikibase\Client\Usage\Sql\EntityUsageTable::removeUsages while atomic section(s) FileDeleteForm::doDelete are open. <i>at Sun, 22 May 2016 19:43:19 GMT</i> <u>served by mw1201</u>

API request failed (internal_api_error_Wikibase\Client\Usage\UsageTrackerException): [V0IL1wpAIEEAABnEJkUAAAAS] Exception Caught: Got explicit BEGIN from Wikibase\Client\Usage\Sql\EntityUsageTable::removeUsages while atomic section(s) FileDeleteForm::doDelete are open. <i>at Sun, 22 May 2016 19:43:19 GMT</i> <u>served by mw1195</u>

API request failed (internal_api_error_Wikibase\Client\Usage\UsageTrackerException): [V0IL1wpAEHYAAG5ovIcAAACA] Exception Caught: Got explicit BEGIN from Wikibase\Client\Usage\Sql\EntityUsageTable::removeUsages while atomic section(s) FileDeleteForm::doDelete are open. <i>at Sun, 22 May 2016 19:43:20 GMT</i> <u>served by mw1138</u>

API request failed (internal_api_error_Wikibase\Client\Usage\UsageTrackerException): [V0IL1wpAIEAAAGLHjNoAAABR] Exception Caught: Got explicit BEGIN from Wikibase\Client\Usage\Sql\EntityUsageTable::removeUsages while atomic section(s) FileDeleteForm::doDelete are open. <i>at Sun, 22 May 2016 19:43:20 GMT</i> <u>served by mw1194</u>

API request failed (internal_api_error_Wikibase\Client\Usage\UsageTrackerException): [V0IL1wpAMCQAAGY8n1EAAAAW] Exception Caught: Got explicit BEGIN from Wikibase\Client\Usage\Sql\EntityUsageTable::removeUsages while atomic section(s) FileDeleteForm::doDelete are open. <i>at Sun, 22 May 2016 19:43:19 GMT</i> <u>served by mw1208</u>

API request failed (internal_api_error_Wikibase\Client\Usage\UsageTrackerException): [V0IL1wpAIEAAAGLHjOgAAABK] Exception Caught: Got explicit BEGIN from Wikibase\Client\Usage\Sql\EntityUsageTable::removeUsages while atomic section(s) FileDeleteForm::doDelete are open. <i>at Sun, 22 May 2016 19:43:20 GMT</i> <u>served by mw1194</u>

API request failed (internal_api_error_Wikibase\Client\Usage\UsageTrackerException): [V0IL1wpAEHEAACGaE1MAAACE] Exception Caught: Got explicit BEGIN from Wikibase\Client\Usage\Sql\EntityUsageTable::removeUsages while atomic section(s) FileDeleteForm::doDelete are open. <i>at Sun, 22 May 2016 19:43:19 GMT</i> <u>served by mw1133</u>

API request failed (internal_api_error_Wikibase\Client\Usage\UsageTrackerException): [V0IL1wpAIDsAAENBGLIAAAAR] Exception Caught: Got explicit BEGIN from Wikibase\Client\Usage\Sql\EntityUsageTable::removeUsages while atomic section(s) FileDeleteForm::doDelete are open. <i>at Sun, 22 May 2016 19:43:20 GMT</i> <u>served by mw1189</u>

API request failed (internal_api_error_Wikibase\Client\Usage\UsageTrackerException): [V0IL2ApAEGIAAEDx1hgAAAAK] Exception Caught: Got explicit BEGIN from Wikibase\Client\Usage\Sql\EntityUsageTable::removeUsages while atomic section(s) FileDeleteForm::doDelete are open. <i>at Sun, 22 May 2016 19:43:20 GMT</i> <u>served by mw1118</u>

@Riley_Huntley That sounds like a separate bug, Wikidata-related. Looks like it's filed as T135485.

Also happened four or five times on:

API request failed (internal_api_error_LocalFileLockError): [V0QTqwpAID8AAGdG7ocAAACU] Exception Caught: Could not acquire lock for 'AMARU-Patricia.jpg' (10.959584951401 sec)

before it just worked. Image preview wouldn't load on file description page, either.

Change 293268 had a related patch set uploaded (by Aaron Schulz):
Make LocalFileLockError and error page exception

https://gerrit.wikimedia.org/r/293268

Change 293268 merged by jenkins-bot:
Make LocalFileLockError an error page exception

https://gerrit.wikimedia.org/r/293268

The error message should be less terse now.

How often does this occur when another user is not currently deleting the file? The only way that could happen would be if the lock failed to release (maybe due to some transient redis error).

Change 296676 had a related patch set uploaded (by Aaron Schulz):
Increase redisLockManager read timeout from 1 to 2 seconds

https://gerrit.wikimedia.org/r/296676

Change 296678 had a related patch set uploaded (by Aaron Schulz):
Use a log group for LocalFile lock errors

https://gerrit.wikimedia.org/r/296678

Change 296815 had a related patch set uploaded (by Aaron Schulz):
Use a log group for LocalFile lock errors

https://gerrit.wikimedia.org/r/296815

Change 296678 merged by jenkins-bot:
Use a log group for LocalFile lock errors

https://gerrit.wikimedia.org/r/296678

Change 296815 merged by jenkins-bot:
Use a log group for LocalFile lock errors

https://gerrit.wikimedia.org/r/296815

This is still happening. I don't believe it's due to multiple people trying to delete the files, since the files remain after failure. Possibly relevant: the files where deletion fails universally take a _long_ time to generate the thumbnails, or simply don't generate thumbnails on the File: pages.

Trying again and again sometimes works, sometimes doesn't.

Recent examples:
[[:File:BREAKFAST (held by) NORDDEUTSCHER LLOYD BREMEN (at) EN ROUTE ABOARD DAMPFER BARBAROSSA (SS;) (NYPL Hades-276248-4000013535).jpg]]
Could not acquire lock for "mwstore://local-multiwrite/local-public/8/85/BREAKFAST_(held_by)_NORDDEUTSCHER_LLOYD_BREMEN_(at)_EN_ROUTE_ABOARD_DAMPFER_BARBAROSSA_(SS;)_(NYPL_Hades-276248-4000013535).jpg".

[[:File:FRUHSTUCK_(held_by)_NORDDEUTSCHER_LLOYD_BREMEN_(at)_DAMPFER;_H.H._MEIER_(SS;)_(NYPL_Hades-276205-4000013489).tiff]]
Could not acquire lock for "mwstore://local-multiwrite/local-public/d/d8/FRUHSTUCK_(held_by)_NORDDEUTSCHER_LLOYD_BREMEN_(at)_DAMPFER;_H.H._MEIER_(SS;)_(NYPL_Hades-276205-4000013489).tiff".

Example of it working after two failed attempts:
[[:File:LUNCH_(held_by)_NORDDEUTSCHER_LLOYD_BREMEN_(at)_"SCHNELLDAMPFER_"HOHENZOLLERN"_(SS;)_(NYPL_Hades-276088-4000013324).tiff]]

Logging indicates that the following happens:
a) many similarly named files are deleted at once, causing deadlocks in GlobalUsage::copyLocalImagelinks
b) this causes DB rollback and the unlockFiles() call in onTransactionIdle() never happens
c) RedisLockManager::destruct() *should* clear the locks at request end, but does not seem to trigger (it might be too late and the redis connection pool may be destroyed)
d) A bunch of locks are stuck for the 5 minute timeout for those files

Change 297284 had a related patch set uploaded (by Aaron Schulz):
Fixes to LocalFile::lock()

https://gerrit.wikimedia.org/r/297284

Change 296676 merged by jenkins-bot:
Increase redisLockManager read timeout from 1 to 2 seconds

https://gerrit.wikimedia.org/r/296676

Change 297284 merged by jenkins-bot:
Fixes to LocalFile::lock()

https://gerrit.wikimedia.org/r/297284

I don't think it's caused by similarity of names... The following just failed twice:

[[:File:B5c2309350800d06dc1efe7414da0fbc.jpg.jpg]]:
Could not acquire lock for "mwstore://local-multiwrite/local-public/f/f2/B5c2309350800d06dc1efe7414da0fbc.jpg.jpg".

[[:File:599full-genevieve-padalecki_.jpg]]:
Could not acquire lock for "mwstore://local-multiwrite/local-public/8/82/599full-genevieve-padalecki_.jpg".

These were the only two failures in a batch of ~70 deletions, and they failed again after a minute or so, before working on the third try just now as I'm typing.

According to kibana, these seem to be relatively rare now.

I can't couldn't delete https://commons.wikimedia.org/wiki/File:%D8%B3%DB%8C%D9%86%D8%A7.PNG

Could not acquire lock for "mwstore://local-multiwrite/local-public/4/40/سینا.PNG".

It was a dupe of another file (https://commons.wikimedia.org/w/index.php?title=File:Sina0.PNG&action=edit&redlink=1) which I successfully deleted a second before trying this one. Does not seem to be a fixed issue, so marking it as resolved because it is "rare" is misleading...

(Successfully deleted minutes after)

When does "because it is rare" became a reason to close tasks as resolved...

For random "can't lock", "lock wait timeout", and "deadlock" errors, it is normal to close bugs if they are rare, as long as nothing ends up corrupted.

Is there any other reason for reopening this? Is there excess or phantom contention?

Unfortunately, they seems back. Cite "I get at least 20 per week closing the daily DRs.".

Can be mass delete a cause?

(@aaron hope it is okay if i am going to re-open this, especially to gave a track - otherwise users will open new task and stuff is getting spitted :-) - otherwise feel free to re-close again.)

Steinsplitter changed the task status from Open to Stalled.Aug 7 2016, 7:12 AM

T141704 is something else?

aaron removed aaron as the assignee of this task.Aug 7 2016, 7:14 AM

@aaron can you check if this is still an issue, or if the related bug's solution also solved this problem? We're wondering whether the Multimedia team needs to pay attention to this or if it got fixed already.

2017-12-08 22:16:19 [WisPIwpAADoAAHf0y50AAADA] mw1263 commonswiki 1.31.0-wmf.11 exception ERROR: [WisPIwpAADoAAHf0y50AAADA] /w/index.php?title=File:Test.svg&action=delete   Wikimedia\Rdbms\DBQueryError from line 1179 of /srv/mediawiki/php-1.31.0-wmf.11/includes/libs/rdbms/database/Database.php: A database query error has occurred. Did you forget to run your application's database schema updater after upgrading?
Query: SELECT  page_latest  FROM `page`    WHERE page_id = '50159152' AND page_namespace = '6' AND page_title = 'Test.svg'  LIMIT 1   FOR UPDATE
Function: WikiPage::lockAndGetLatest
Error: 1205 Lock wait timeout exceeded; try restarting transaction (10.64.48.23)
 {"exception_id":"WisPIwpAADoAAHf0y50AAADA","exception_url":"/w/index.php?title=File:Test.svg&action=delete","caught_by":"mwe_handler"}
[Exception Wikimedia\Rdbms\DBQueryError] (/srv/mediawiki/php-1.31.0-wmf.11/includes/libs/rdbms/database/Database.php:1179) A database query error has occurred. Did you forget to run your application's database schema updater after upgrading?
Query: SELECT  page_latest  FROM `page`    WHERE page_id = '50159152' AND page_namespace = '6' AND page_title = 'Test.svg'  LIMIT 1   FOR UPDATE
Function: WikiPage::lockAndGetLatest
Error: 1205 Lock wait timeout exceeded; try restarting transaction (10.64.48.23)

  #0 /srv/mediawiki/php-1.31.0-wmf.11/includes/libs/rdbms/database/Database.php(991): Wikimedia\Rdbms\Database->reportQueryError(string, integer, string, string, boolean)
  #1 /srv/mediawiki/php-1.31.0-wmf.11/includes/libs/rdbms/database/Database.php(1374): Wikimedia\Rdbms\Database->query(string, string)
  #2 /srv/mediawiki/php-1.31.0-wmf.11/includes/libs/rdbms/database/Database.php(1200): Wikimedia\Rdbms\Database->select(string, string, array, string, array, array)
  #3 /srv/mediawiki/php-1.31.0-wmf.11/includes/page/WikiPage.php(2994): Wikimedia\Rdbms\Database->selectField(string, string, array, string, array)
  #4 /srv/mediawiki/php-1.31.0-wmf.11/includes/page/WikiPage.php(2815): WikiPage->lockAndGetLatest()
  #5 /srv/mediawiki/php-1.31.0-wmf.11/includes/FileDeleteForm.php(197): WikiPage->doDeleteArticleReal(string, boolean, integer, boolean, string, User, array)
  #6 /srv/mediawiki/php-1.31.0-wmf.11/includes/FileDeleteForm.php(119): FileDeleteForm::doDelete(Title, LocalFile, string, string, boolean, User)
  #7 /srv/mediawiki/php-1.31.0-wmf.11/includes/page/ImagePage.php(992): FileDeleteForm->execute()
  #8 /srv/mediawiki/php-1.31.0-wmf.11/includes/actions/DeleteAction.php(46): ImagePage->delete()
  #9 /srv/mediawiki/php-1.31.0-wmf.11/includes/MediaWiki.php(499): DeleteAction->show()
--
2017-12-08 22:16:19 [WisPIwpAADoAAHf0y50AAADA] mw1263 commonswiki 1.31.0-wmf.11 exception ERROR: [WisPIwpAADoAAHf0y50AAADA] /w/index.php?title=File:Test.svg&action=delete   Wikimedia\Rdbms\DBTransactionError from line 1213 of /srv/mediawiki/php-1.31.0-wmf.11/includes/libs/rdbms/loadbalancer/LoadBalancer.php: Explicit transaction still active. A caller may have caught an error. {"exception_id":"WisPIwpAADoAAHf0y50AAADA","exception_url":"/w/index.php?title=File:Test.svg&action=delete","caught_by":"mwe_handler"}
[Exception Wikimedia\Rdbms\DBTransactionError] (/srv/mediawiki/php-1.31.0-wmf.11/includes/libs/rdbms/loadbalancer/LoadBalancer.php:1213) Explicit transaction still active. A caller may have caught an error.
  #0 [internal function]: Closure$Wikimedia\Rdbms\LoadBalancer::approveMasterChanges(Wikimedia\Rdbms\DatabaseMysqli)
  #1 /srv/mediawiki/php-1.31.0-wmf.11/includes/libs/rdbms/loadbalancer/LoadBalancer.php(1556): call_user_func_array(Closure$Wikimedia\Rdbms\LoadBalancer::approveMasterChanges;352, array)
  #2 /srv/mediawiki/php-1.31.0-wmf.11/includes/libs/rdbms/loadbalancer/LoadBalancer.php(1233): Wikimedia\Rdbms\LoadBalancer->forEachOpenMasterConnection(Closure$Wikimedia\Rdbms\LoadBalancer::approveMasterChanges;352)
  #3 [internal function]: Wikimedia\Rdbms\LoadBalancer->approveMasterChanges(array)
  #4 /srv/mediawiki/php-1.31.0-wmf.11/includes/libs/rdbms/lbfactory/LBFactory.php(184): call_user_func_array(array, array)
  #5 [internal function]: Closure$Wikimedia\Rdbms\LBFactory::forEachLBCallMethod(Wikimedia\Rdbms\LoadBalancer, string, array)
  #6 /srv/mediawiki/php-1.31.0-wmf.11/includes/libs/rdbms/lbfactory/LBFactoryMulti.php(425): call_user_func_array(Closure$Wikimedia\Rdbms\LBFactory::forEachLBCallMethod;244, array)
  #7 /srv/mediawiki/php-1.31.0-wmf.11/includes/libs/rdbms/lbfactory/LBFactory.php(187): Wikimedia\Rdbms\LBFactoryMulti->forEachLB(Closure$Wikimedia\Rdbms\LBFactory::forEachLBCallMethod;244, array)
  #8 /srv/mediawiki/php-1.31.0-wmf.11/includes/libs/rdbms/lbfactory/LBFactory.php(224): Wikimedia\Rdbms\LBFactory->forEachLBCallMethod(string, array)
  #9 /srv/mediawiki/php-1.31.0-wmf.11/includes/MediaWiki.php(881): Wikimedia\Rdbms\LBFactory->commitMasterChanges(string)
  #10 /srv/mediawiki/php-1.31.0-wmf.11/includes/MediaWiki.php(719): MediaWiki->restInPeace(string, boolean)
  #11 [internal function]: Closure$MediaWiki::doPostOutputShutdown()
  #12 {main}

I guess Test.svg is a really problem (limit) indicator, as this has the ongoing growing longest file history ever (+ already deletions).

Aklapper changed the task status from Stalled to Open.May 19 2020, 12:42 PM

The previous comments don't explain what/who exactly this task is stalled on ("If a report is waiting for further input (e.g. from its reporter or a third party) and can currently not be acted on"). Hence resetting task status.

(Smallprint, as general orientation for task management: If you wanted to express that nobody is currently working on this task, then the assignee should be removed and/or priority could be lowered instead. If work on this task is blocked by another task, then that other task should be added via Edit Related Tasks...Edit Subtasks. If this task is stalled on an upstream project, then the Upstream tag should be added. If this task requires info from the task reporter, then there should be instructions which info is needed. If this task is out of scope and nobody should ever work on this, then task status should have the "Declined" status.)

I've been repeatedly running into this issue. (During upload, not deletion, like in T129621 which was marked as a duplicate of this.)

For ~3 days now, each time I try to upload a new version for File:Enthusiasm_(1931)_by_Dziga_Vertov.webm on Commons, it gets a lock error. I believe the first error was a generic "Server failed to publish temporary file", and since then retrying a new upload will fail.

These were the API error responses from August 15th, 14th, and 13th:

{"error":{"code":"stashfailed","info":"Could not acquire lock. Somebody else is doing something to this file.","*":"See https://commons.wikimedia.org/w/api.php for API usage. Subscribe to the mediawiki-api-announce mailing list at &lt;https://lists.wikimedia.org/postorius/lists/mediawiki-api-announce.lists.wikimedia.org/&gt; for notice of API deprecations and breaking changes."},"servedby":"mw-api-ext.eqiad.main-5744d5b77-g77vk"}
{"error":{"code":"stashfailed","info":"Could not acquire lock. Somebody else is doing something to this file.","*":"See https://commons.wikimedia.org/w/api.php for API usage. Subscribe to the mediawiki-api-announce mailing list at &lt;https://lists.wikimedia.org/postorius/lists/mediawiki-api-announce.lists.wikimedia.org/&gt; for notice of API deprecations and breaking changes."},"servedby":"mw-api-ext.eqiad.main-64795c67cd-flhmn"}
{"error":{"code":"stashfailed","info":"Could not acquire lock. Somebody else is doing something to this file.","*":"See https://commons.wikimedia.org/w/api.php for API usage. Subscribe to the mediawiki-api-announce mailing list at &lt;https://lists.wikimedia.org/postorius/lists/mediawiki-api-announce.lists.wikimedia.org/&gt; for notice of API deprecations and breaking changes."},"servedby":"mw-api-ext.eqiad.main-77975868c8-2dbxd"}

I was also getting lock errors trying to upload a new version to File:His_Girl_Friday_(1940,_SDR).webm which I think were the same issue, but I didn't save those.

Yes, happens quite often during uploading.

<?xml version="1.0"?><api servedby="mw-api-ext.codfw.main-549884f474-dxcr5"><error code="lockmanager-fail-conflict" info="Could not acquire lock. Somebody else is doing something to this file." filekey="1bc1f1uutgg8.alkrsu.6539091.pdf" sessionkey="1bc1f1uutgg8.alkrsu.6539091.pdf" xml:space="preserve">See https://commons.wikimedia.org/w/api.php for API usage. Subscribe to the mediawiki-api-announce mailing list at &amp;lt;https://lists.wikimedia.org/postorius/lists/mediawiki-api-announce.lists.wikimedia.org/&amp;gt; for notice of API deprecations and breaking changes.</error></api>

Usually accompanied by another error (in a concurrent upload request for a different file)

<?xml version="1.0"?><api servedby="mw-api-ext.codfw.main-549884f474-rrgm6"><error code="internal_api_error_DBQueryError" info="[01a352e1-a573-4977-a7b7-89eaa3a282e2] Caught exception of type Wikimedia\Rdbms\DBQueryError" errorclass="Wikimedia\Rdbms\DBQueryError" /></api>
HCoplin-WMF lowered the priority of this task from High to Medium.May 7 2026, 11:45 AM
HCoplin-WMF added subscribers: Ladsgroup, HCoplin-WMF.

Lowering the priority until we have more info. I also have a few questions:

  1. Can we confirm this is still happening? I assume it is, since it's periodically come up for ~10 years at this point.
  2. Do we have a sense of how commonly this is happening overall? There are scattered reports, but it would be nice if we could see the overall rate of errors.
  3. Can we confirm that this is the same lock issue for uploading a new version (per recent comments) and deleting?
  4. What are the steps to recover from this, when it occurs? Does an immediate retry typically help? Trying again after a few minutes?

Tagging @Ladsgroup for vis and possible context, too.

I'm rebuilding and revamping the redis lock manager logic which should solve all sorts of race conditions such as the ones above. We can revisit this once those changes go live (hopefully soon)

If anyone from mw teams could review and merge this patch, I'd be really grateful https://gerrit.wikimedia.org/r/c/mediawiki/core/+/1279499 It should help with the improving of the infrastructure a lot.

Change #1296008 had a related patch set uploaded (by Aaron Schulz; author: Aaron Schulz):

[mediawiki/core@master] [WIP] filerepo: use "nonLocking" operations in LocalRepo

https://gerrit.wikimedia.org/r/1296008

aaron updated Other Assignee, added: aaron.

The only thing that throws LocalFileLockError is the unused LockFile::lock() method. I assume there error we are looking for is mostly lockmanager-fail-conflict now. We don't enable the LockManager SPI channel in production.

I see that the various acquireFileLock() callers do not use any sort of ScopedCallback (left out from T283045), so any uncaught errors will cause the releaseFileLock() call to not happen. RedisLockManager::__destruct() will try to release such locks at the end of the request (even after MediaWiki shutdown and post-send methods). Maybe:

  • there are a lot of slow post-send updates making it take a long time for __destruct to be reached. During this time, retries will fail to acquire the locks.
  • hard 5min timeout causes the release to not happen. Note that the locks take 10min to expire given <<2 * ceil( RequestTimeout::singleton()->getWallTimeLimit() ) >>.
  • doLockingRequestBucket fails to distinguish a "server error" from "lock conflict" for a now-dropped connection and << $status->merge( $this->doUnlockByType( $lockedPaths ) ); >> fails to do anything in terms of releasing locks from the other peer servers since locksHeld wasn't set yet.
  • doLockingRequestBucket generally fails to cleanup locks if a quorum is not reached due to whatever I/O or connection error.
aaron updated Other Assignee, removed: aaron.