Page MenuHomePhabricator

Deadlock in SqlBagOStuff::modifyTableSpecificBlobsForSet
Open, HighPublicPRODUCTION ERROR

Description

Error
Error 1213 from SqlBagOStuff::modifyTableSpecificBlobsForSet, Deadlock found when trying to get lock; try restarting transaction...
normalized_message
Error 1213 from SqlBagOStuff::modifyTableSpecificBlobsForSet, {error} {sql1line} {db_server}
exception.trace
from /srv/mediawiki/php-1.41.0-wmf.28/includes/libs/rdbms/database/Database.php(1217)
#0 /srv/mediawiki/php-1.41.0-wmf.28/includes/libs/rdbms/database/Database.php(1194): Wikimedia\Rdbms\Database->getQueryExceptionAndLog(string, integer, string, string)
#1 /srv/mediawiki/php-1.41.0-wmf.28/includes/libs/rdbms/database/Database.php(679): Wikimedia\Rdbms\Database->reportQueryError(string, integer, string, string, boolean)
#2 /srv/mediawiki/php-1.41.0-wmf.28/includes/libs/rdbms/database/DatabaseMySQL.php(557): Wikimedia\Rdbms\Database->query(Wikimedia\Rdbms\Query, string)
#3 /srv/mediawiki/php-1.41.0-wmf.28/includes/libs/rdbms/database/DBConnRef.php(119): Wikimedia\Rdbms\DatabaseMySQL->upsert(string, array, array, array, string)
#4 /srv/mediawiki/php-1.41.0-wmf.28/includes/libs/rdbms/database/DBConnRef.php(558): Wikimedia\Rdbms\DBConnRef->__call(string, array)
#5 /srv/mediawiki/php-1.41.0-wmf.28/includes/libs/rdbms/querybuilder/InsertQueryBuilder.php(334): Wikimedia\Rdbms\DBConnRef->upsert(string, array, array, array, string)
#6 /srv/mediawiki/php-1.41.0-wmf.28/includes/objectcache/SqlBagOStuff.php(596): Wikimedia\Rdbms\InsertQueryBuilder->execute()
#7 /srv/mediawiki/php-1.41.0-wmf.28/includes/objectcache/SqlBagOStuff.php(524): SqlBagOStuff->modifyTableSpecificBlobsForSet(Wikimedia\Rdbms\DBConnRef, string, double, array, array)
#8 /srv/mediawiki/php-1.41.0-wmf.28/includes/objectcache/SqlBagOStuff.php(319): SqlBagOStuff->modifyBlobs(array, double, array)
#9 /srv/mediawiki/php-1.41.0-wmf.28/includes/libs/objectcache/MediumSpecificBagOStuff.php(718): SqlBagOStuff->doSetMulti(array, integer, integer)
#10 /srv/mediawiki/php-1.41.0-wmf.28/includes/ResourceLoader/dependencystore/KeyValueDependencyStore.php(81): MediumSpecificBagOStuff->setMulti(array, integer, integer)
#11 /srv/mediawiki/php-1.41.0-wmf.28/includes/ResourceLoader/ResourceLoader.php(552): Wikimedia\DependencyStore\KeyValueDependencyStore->storeMulti(string, array, integer)
#12 /srv/mediawiki/php-1.41.0-wmf.28/includes/deferred/MWCallableUpdate.php(40): MediaWiki\ResourceLoader\ResourceLoader->MediaWiki\ResourceLoader\{closure}()
#13 /srv/mediawiki/php-1.41.0-wmf.28/includes/deferred/DeferredUpdates.php(463): MWCallableUpdate->doUpdate()
#14 /srv/mediawiki/php-1.41.0-wmf.28/includes/deferred/DeferredUpdates.php(192): DeferredUpdates::attemptUpdate(MWCallableUpdate)
#15 /srv/mediawiki/php-1.41.0-wmf.28/includes/deferred/DeferredUpdates.php(276): DeferredUpdates::run(MWCallableUpdate)
#16 /srv/mediawiki/php-1.41.0-wmf.28/includes/deferred/DeferredUpdatesScope.php(267): DeferredUpdates::{closure}(MWCallableUpdate, integer)
#17 /srv/mediawiki/php-1.41.0-wmf.28/includes/deferred/DeferredUpdatesScope.php(196): DeferredUpdatesScope->processStageQueue(integer, integer, Closure)
#18 /srv/mediawiki/php-1.41.0-wmf.28/includes/deferred/DeferredUpdates.php(295): DeferredUpdatesScope->processUpdates(integer, Closure)
#19 /srv/mediawiki/php-1.41.0-wmf.28/includes/MediaWiki.php(1169): DeferredUpdates::doUpdates()
#20 /srv/mediawiki/php-1.41.0-wmf.28/includes/MediaWiki.php(887): MediaWiki->restInPeace()
#21 /srv/mediawiki/php-1.41.0-wmf.28/load.php(61): MediaWiki->doPostOutputShutdown()
#22 /srv/mediawiki/php-1.41.0-wmf.28/load.php(39): wfLoadMain()
#23 /srv/mediawiki/w/load.php(3): require(string)
#24 {main}
Impact

Impacting Commons.

Notes

Details

Request URL
https://commons.wikimedia.org/w/load.php?lang=*&modules=*&only=*&skin=*

Event Timeline

Tgr renamed this task from Error 1213 from SqlBagOStuff::modifyTableSpecificBlobsForSet, {error} {sql1line} {db_server} to Deadlock in SqlBagOStuff::modifyTableSpecificBlobsForSet.EditedOct 6 2023, 1:06 AM
Tgr updated the task description. (Show Details)
Tgr subscribed.

~50 instances in the last 30 days, various wikis and MW versions, at a glance they all have the ResourceLoaderModule-dependencies cache key prefix.

Krinkle subscribed.

The caller to DependencyStore->storeMulti/SqlBagOStuff::doSetMulti is ResourceLoader::saveModuleDependenciesInternal (using a deferred update).

This caller specifically acquires a lock before doing the write, and skips any writes for which a lock could not be obtained (both if the lock is already held, and if memcached failed for any reason when trying to create a lock, this means some updates will be skipped even if nobody holds a lock. this is okay because DepStore is a self-correcting system). The reason for that don't matter right now, but the point is it is designed to be "really sure" that it is the only one holding the lock so that there can't be contention, even if it means nobody including itself is doing the update.

As such, it's quite surprising that this can deadlock. That makes me wonder whether the deadlock could be caused by something other than two requests writing the same key.

Could there be something about the way SqlBagOStuff expresses expiry conditions in its upsert queries, that makes MariaDB/MySQL lock rows other than the specified keys? I'm not sure what else it could be. Tagging ObjectCache lib as such.

I've checked the reqId in the task description, and it logs nothing from any other channel. I.e. no memcached failure about the lock (which, even if there was, per the above should return false and thus skip the update anyway, but would be interesting at least to correlate if it was coinciding with a memc failure).

Krinkle triaged this task as High priority.Oct 9 2023, 2:46 PM

Some background reading on deadlocks, as recommended by @aaron, that may be of use and interest:

https://dev.mysql.com/blog-archive/?author=Kuba%20%C5%81opusza%C5%84ski

Could there be something about the way SqlBagOStuff expresses expiry conditions in its upsert queries, that makes MariaDB/MySQL lock rows other than the specified keys?

Per https://dev.mysql.com/doc/refman/8.0/en/innodb-locks-set.html :

INSERT ... ON DUPLICATE KEY UPDATE differs from a simple INSERT in that an exclusive lock rather than a shared lock is placed on the row to be updated when a duplicate-key error occurs. An exclusive index-record lock is taken for a duplicate primary key value. An exclusive next-key lock is taken for a duplicate unique key value.

so it cannot. Can two multi-row inserts with multiple conflicting keys deadlock because they try to obtain the locks in a different order? Seems like something the DB engine could easily avoid but maybe it doesn't for performance reasons? But then, it seems like the BagOStuff::lock() calls earlier would prevent that.

In general, deadlocks are a PITA to debug because you only see one of the two conflicting queries so it's hard to guess what's happening. There was this old task: T128011: Consider installing pt-deadlock-logger, I wonder if it can be revisited to see if that or some similar tool can be used to get more information about deadlocks.

pmiazga subscribed.

@Ladsgroup - I'd like to ask for your help here - We have another ticket where we need to investigate the DB deadlocks - those happen from time to time in different parts of the MediaWiki. Do you know any possible solution (maybe the pt-deadlock-logger) that could help us debug those problems? It's not possible (most of cases) to reproduce those locally and trying to find out what caused those on prod is very tricky and usually fruitless.

There's also an innodb_print_all_deadlocks setting - maybe we could just try stashing all deadlock information in Logstash?

@Ladsgroup - I'd like to ask for your help here - We have another ticket where we need to investigate the DB deadlocks - those happen from time to time in different parts of the MediaWiki. Do you know any possible solution (maybe the pt-deadlock-logger) that could help us debug those problems? It's not possible (most of cases) to reproduce those locally and trying to find out what caused those on prod is very tricky and usually fruitless.

sorry for late response. What I/DBAs can do is to call SHOW ENGINE INNODB STATUS; which shows latest deadlock information. I pasted some of these for example T255706#8140816 or T256229#8140773 but none provided useful enough information to actually solve the issue. If you think it's useful, I can try to capture one and post it here.