Page MenuHomePhabricator

RuntimeException: Could not acquire lock for page ID (via LinksUpdate)
Open, LowPublicPRODUCTION ERROR

Description

Error
normalized_message
[{reqId}] {exception_url}   RuntimeException: Could not acquire lock for page ID '9277282'.
exception.trace
from /srv/mediawiki/php-1.37.0-wmf.3/includes/deferred/LinksUpdate.php(189)
#0 /srv/mediawiki/php-1.37.0-wmf.3/includes/deferred/DeferredUpdates.php(513): LinksUpdate->doUpdate()
#1 /srv/mediawiki/php-1.37.0-wmf.3/includes/deferred/RefreshSecondaryDataUpdate.php(101): DeferredUpdates::attemptUpdate(LinksUpdate, Wikimedia\Rdbms\LBFactoryMulti)
#2 /srv/mediawiki/php-1.37.0-wmf.3/includes/deferred/DeferredUpdates.php(513): RefreshSecondaryDataUpdate->doUpdate()
#3 /srv/mediawiki/php-1.37.0-wmf.3/includes/deferred/DeferredUpdates.php(390): DeferredUpdates::attemptUpdate(RefreshSecondaryDataUpdate, Wikimedia\Rdbms\LBFactoryMulti)
#4 /srv/mediawiki/php-1.37.0-wmf.3/includes/deferred/DeferredUpdates.php(221): DeferredUpdates::run(RefreshSecondaryDataUpdate, Wikimedia\Rdbms\LBFactoryMulti, Monolog\Logger, BufferingStatsdDataFactory, string)
#5 /srv/mediawiki/php-1.37.0-wmf.3/includes/deferred/DeferredUpdatesScope.php(264): DeferredUpdates::{closure}(RefreshSecondaryDataUpdate, integer)
#6 /srv/mediawiki/php-1.37.0-wmf.3/includes/deferred/DeferredUpdatesScope.php(196): DeferredUpdatesScope->processStageQueue(integer, integer, Closure)
#7 /srv/mediawiki/php-1.37.0-wmf.3/includes/deferred/DeferredUpdates.php(242): DeferredUpdatesScope->processUpdates(integer, Closure)
#8 /srv/mediawiki/php-1.37.0-wmf.3/includes/MediaWiki.php(1132): DeferredUpdates::doUpdates(string)
#9 /srv/mediawiki/php-1.37.0-wmf.3/includes/MediaWiki.php(837): MediaWiki->restInPeace()
#10 /srv/mediawiki/php-1.37.0-wmf.3/includes/MediaWiki.php(575): MediaWiki->doPostOutputShutdown()
#11 /srv/mediawiki/php-1.37.0-wmf.3/index.php(53): MediaWiki->run()
#12 /srv/mediawiki/php-1.37.0-wmf.3/index.php(46): wfIndexMain()
#13 /srv/mediawiki/w/index.php(3): require(string)
#14 {main}
Impact

Low, I think, but after last week's "this looks minor" incident, I'm not sure. There's a few dozen of this kind of message over the past few hours. I think it would be good for a DBA to have a look.

Notes

Details

Request URL
https://ru.wikinews.org/w/index.php?title=*&action=submit

Event Timeline

jcrespo subscribed.

@LarsWirzenius --

Unlike last week's, which was a MySQL Operational Error (too many connections), this is a query error, for which DBAs may not be able to do anything about it. The right tag for these would be Wikimedia-database-issue of which the ownership is at the moment a bit vague, but on the Mediawiki side (query pattern/performance optimization), not infrastructure (db servers unavailable). The difference, while subtle, is that, from the DBAs point of view, this is not an error, but an expected outcome (it may not be for the database/extension developer). Of course, like with any error, the immediate trigger may result on a deeper reason we could help with- eg. a query error could be later seen as caused by a database server too slow to respond, but the first analysis should probably have to do this distinction.

About this error --
For the record, we have seen lately an increasing number of "Could not acquire lock for page ID" and "User::loadFromDatabase: Lock wait timeout exceeded; try restarting transaction" errors, these are normally due to contention on a particular workflow due to quick edits/actions on the same page/user/bot. Not sure if platform or performance would like to have a look and decide if it is worth optimizing the critical path of the lock causing the contention, there or it is within the margin of availability (error budget). Sometimes it could be related to architecture changes, sometimes it is just traffic-caused. It is for them to say.

I just saw this comment on IRC:

<Amir1> 98.2% of ruwikinews (17M rows) watchlist table is just a bot

Could it be that there was some mass deletion that could have caused temporary slowdown on certain queries? Not that I am aware of any, but I think it is relevant to add @Ladsgroup as he happened to mention this small wiki at the same time the errors happen?

Krinkle subscribed.

This is an issue with LinksUpdate. I think it may be a duplicate but I couldn't find the task I was thinking of. Perhaps it was resolved and came back.

I explicitly asked the user not to clean their watchlist so I doubt that would have caused it: https://meta.wikimedia.org/wiki/User_talk:Krassotkin#Your_bot's_watchlist_in_ruwikinews

I haven't started my script yet (as the user haven't responded to me yet) but when I start, it'll be just 2k rows every five seconds, it's really really low-weight.

This is still occurring albeit at a rather low frequency. I have added a log filter to the mediawiki-new-errors dashboard.

I just saw this comment on IRC:

<Amir1> 98.2% of ruwikinews (17M rows) watchlist table is just a bot

Could it be that there was some mass deletion that could have caused temporary slowdown on certain queries? Not that I am aware of any, but I think it is relevant to add @Ladsgroup as he happened to mention this small wiki at the same time the errors happen?

That should be fixed for new bots by fix of T258108

I haven't started my script yet (as the user haven't responded to me yet) but when I start, it'll be just 2k rows every five seconds, it's really really low-weight.

The user cannot longer response. But maybe the deletion is also done as part of T258098.

I haven't started my script yet (as the user haven't responded to me yet) but when I start, it'll be just 2k rows every five seconds, it's really really low-weight.

The user cannot longer response. But maybe the deletion is also done as part of T258098.

It has been cleaned a while ago. I just double checked:

mysql:research@s3-analytics-replica.eqiad.wmnet [ruwikinews]> select count(*) from watchlist;
+----------+
| count(*) |
+----------+
|   348099 |
+----------+
1 row in set (0.080 sec)
brennen subscribed.

Still occurring as of 1.38.0-wmf.19 (T293960).

It seems the new one is a bit different and happening with insertion of wikidata terms to term store.

Krinkle renamed this task from RuntimeException: Could not acquire lock for page ID '9277282'. to RuntimeException: Could not acquire lock for page ID (via LinksUpdate).Jun 8 2023, 4:33 PM