deadlocks mass-deleting media files in categories
Closed, ResolvedPublic

Description

When mass-deleting media files that are in a category, quite often deadlocks are seen originating from "Article::updateCategoryCounts".

Mass deletion is a relative term here: having opened 7 delete tabs in the browser for Wikimedia Commons and deleting them as quickly as possible (say within 2-3 seconds), I may get 3 deadlock out of 7. Would be a great help for maintaince if this was somehow resolved.


Version: 1.22.0
Severity: normal
See Also:
https://bugzilla.wikimedia.org/show_bug.cgi?id=46086

bzimport added a subscriber: wikibugs-l.
bzimport set Reference to bz13921.
siebrand created this task.Via LegacyMay 1 2008, 9:45 PM
siebrand added a comment.Via ConduitMay 18 2008, 2:02 PM

Issue still there. I also ran into another deadlock sotuation: removing 4 older versions of a file quickly after eachother: “LocalFileDeleteBatch::doDBDeletes” gives: “1213: Deadlock found when trying to get lock; Try restarting transaction (10.0.0.231)”.

aaron added a comment.Via ConduitSep 16 2008, 12:09 PM

Perhaps this could use the jobqueue to keep a nice linear execution order.

aaron added a comment.Via ConduitSep 16 2008, 12:21 PM

(In reply to comment #2)

Perhaps this could use the jobqueue to keep a nice linear execution order.

Looking at what it does, that shouldn't be n..

bzimport added a comment.Via ConduitFeb 16 2009, 6:03 PM

ayg wrote:

Ugh, I hate databases. :(

demon added a comment.Via ConduitMar 22 2011, 1:58 AM

Is still still reproducible?

siebrand added a comment.Via ConduitMar 22 2011, 6:58 AM

(In reply to comment #5)

Is still still reproducible?

I don't do any mass deletions anymore at Commons, but let me quickly check if there are enough speedy deletion tags at the moment to give it a try.

siebrand added a comment.Via ConduitMar 22 2011, 7:02 AM

(In reply to comment #6)

(In reply to comment #5)
> Is still still reproducible?

I don't do any mass deletions anymore at Commons, but let me quickly check if
there are enough speedy deletion tags at the moment to give it a try.

Cannot reproduce.

Reedy added a comment.Via ConduitApr 11 2011, 1:08 PM

r85785

Reedy added a comment.Via ConduitApr 11 2011, 6:46 PM

Reverted in r85814

siebrand added a comment.Via ConduitMay 17 2011, 3:09 PM

(In reply to comment #5)

Is still still reproducible?

As said, I haven't tried with images, but I just reproduced it mass deleting pages in the MediaWiki namespace on nl.wikimedia.org (1.17wmf1 (r88299)). The error thrown came from RecentChange::save and was:

1213: Deadlock found when trying to get lock; try restarting transaction (10.0.6.49)”.

AzaToth added a comment.Via ConduitMay 24 2011, 10:19 PM

When updating twinkle to api now, I'm getting database query errors a lot (10 in a minute depending on the amount being processed) which seems to be related to this:

<?xml version="1.0"?><api servedby="srv255"><error code="internal_api_error_DBQueryError" info="Database query error" xml:space="preserve">

#0 /usr/local/apache/common-local/php-1.17/includes/db/Database.php(734): DatabaseBase->reportQueryError('Deadlock found ...', 1213, 'UPDATE page ...', 'Title::invalida...', false)
#1 /usr/local/apache/common-local/php-1.17/includes/db/Database.php(1349): DatabaseBase->query('UPDATE page ...', 'Title::invalida...')
#2 /usr/local/apache/common-local/php-1.17/includes/Title.php(2485): DatabaseBase->update('page', Array, Array, 'Title::invalida...')
#3 /usr/local/apache/common-local/php-1.17/includes/Article.php(4093): Title->invalidateCache()
#4 /usr/local/apache/common-local/php-1.17/includes/Article.php(3271): Article::onArticleDelete(Object(Title))
#5 /usr/local/apache/common-local/php-1.17/includes/api/ApiDelete.php(154): Article->doDeleteArticle('Deleted talk pa...')
#6 /usr/local/apache/common-local/php-1.17/includes/api/ApiDelete.php(79): ApiDelete::delete(Object(Article), 'a0b48ef8649b3b5...', 'Deleted talk pa...')
#7 /usr/local/apache/common-local/php-1.17/includes/api/ApiMain.php(657): ApiDelete->execute()
#8 /usr/local/apache/common-local/php-1.17/includes/api/ApiMain.php(339): ApiMain->executeAction()
#9 /usr/local/apache/common-local/php-1.17/includes/api/ApiMain.php(323): ApiMain->executeActionWithErrorHandling()
#10 /usr/local/apache/common-local/php-1.17/api.php(115): ApiMain->execute()
#11 /usr/local/apache/common-local/live-1.5/api.php(3): require('/usr/local/apac...')
#12 {main}

</error></api>

siebrand added a comment.Via ConduitJan 6 2012, 11:44 AM

Just now I came across this again. Slightly different scenario -- mass deleting empty categories on Wikimedia Commons:

“RecentChange::save”: “1213: Deadlock found when trying to get lock; try restarting transaction (10.0.6.32)”.

siebrand added a comment.Via ConduitJan 6 2012, 11:52 AM

Adding to comment 12: Two more methods the deadlock was reported in:

  • WikiPage::updateCategoryCounts
  • HTMLCacheUpdate::invalidateTitles
aaron added a comment.Via ConduitApr 11 2013, 8:56 PM

(In reply to comment #13)

Adding to comment 12: Two more methods the deadlock was reported in:

  • WikiPage::updateCategoryCounts
  • HTMLCacheUpdate::invalidateTitles

These are the only two I'm still seeing in the logs a lot.

MZMcBride added a comment.Via ConduitApr 14 2013, 5:40 PM
tstarling added a comment.Via ConduitApr 15 2013, 1:16 AM
  • Bug 46086 has been marked as a duplicate of this bug. ***
tstarling added a comment.Via ConduitApr 15 2013, 1:29 AM

Log analysis and frequent reports on bug 46086 indicate that this has been happening more often since 1.21wmf11 was deployed. So, increasing priority.

tstarling added a comment.Via ConduitApr 15 2013, 1:32 AM

Created attachment 12105
Exception rate time series

Rate of exceptions with WikiPage->doDeleteUpdates in the backtrace. Clearly something new is going on, possibly correlated with deployment of MW 1.21wmf11 to commonswiki.

Attached:

tstarling added a comment.Via ConduitApr 15 2013, 3:05 AM

The transaction length is certainly epic. Here's an indicative debug log trace from my local test wiki, just showing the transaction in question, annotated with hook calls:

http://paste.tstarling.com/p/spMggU.html

35 DB queries, 59 hook calls, 4 squid purges and some FileBackend operations. It's not hard to imagine something in there being slow.

tstarling added a comment.Via ConduitApr 15 2013, 4:05 AM

What makes this tricky is the use of the $commit=false parameter to doDeleteArticleReal() by FileDeleteForm to attempt to roll back everything when a FileBackend operation fails. Of course, you can't unpurge Squid or unsend an IRC log line, but at least it does make an effort.

Moving things like squid purges and links updates to a DeferredUpdates job would be easy if it wasn't for this. You can't unqueue a DeferredUpdates job at present. Maybe you could clear the queue or somehow identify invalid jobs, but it would be pretty messy and heuristic. The use of onTransactionIdle() would suffer similar problems.

Ideally, WikiPage::doDeleteArticleReal() would be split up into a pre-commit function and a post-commit function, and FileDeleteForm would call the post-commit function after the file operations are successful.

WikiPage::doDeleteArticleReal() itself would continue to do both halves, for backwards compatibility, which would be fairly convenient in the short term since ApiDelete and maintenance/deleteBatch.php could continue to call it. But in the long term, those two callers would be best served by a new entry point that treated file and non-file deletions equivalently.

Assigning to Rob Lanphier for delegation.

RobLa-WMF added a comment.Via ConduitApr 15 2013, 4:46 PM

Giving this to AaronSchulz to look at.

MZMcBride added a comment.Via ConduitApr 16 2013, 2:45 AM

Related: https://gerrit.wikimedia.org/r/58820. Tim thinks this should help with about 80% of the problem.

Aklapper added a comment.Via ConduitMay 24 2013, 10:45 AM

Ariel: You silently reopened this - could you add a comment, e.g. how often / where this still can be seen? Thanks.

ArielGlenn added a comment.Via ConduitMay 24 2013, 11:38 AM

oops, I didn't mean to re-open. just to add myself. sorry! do what you need to do...

Aklapper added a comment.Via ConduitMay 24 2013, 2:10 PM

Alright :)

Add Comment