Page MenuHomePhabricator

DBQueryError: Duplicate entry INSERT INTO `watchlist_expiry` (Function: WatchedItemStore::updateExpiriesAfterMove)
Closed, ResolvedPublicPRODUCTION ERROR

Description

Error message
Error 1062: Duplicate entry 'XXXXXXX' for key 'PRIMARY' (10.64.48.35)
Function: WatchedItemStore::updateExpiriesAfterMove
Query: INSERT INTO `watchlist_expiry` (we_item,we_expiry) VALUES (XXXXXX, '20210212085219')
Stack Trace
#0 /srv/mediawiki/php-1.36.0-wmf.25/includes/libs/rdbms/database/Database.php(1683): Wikimedia\Rdbms\Database->getQueryException(string, integer, string, string)
#1 /srv/mediawiki/php-1.36.0-wmf.25/includes/libs/rdbms/database/Database.php(1658): Wikimedia\Rdbms\Database->getQueryExceptionAndLog(string, integer, string, string)
#2 /srv/mediawiki/php-1.36.0-wmf.25/includes/libs/rdbms/database/Database.php(1227): Wikimedia\Rdbms\Database->reportQueryError(string, integer, string, string, boolean)
#3 /srv/mediawiki/php-1.36.0-wmf.25/includes/libs/rdbms/database/Database.php(2343): Wikimedia\Rdbms\Database->query(string, string, integer)
#4 /srv/mediawiki/php-1.36.0-wmf.25/includes/libs/rdbms/database/Database.php(2323): Wikimedia\Rdbms\Database->doInsert(string, array, string)
#5 /srv/mediawiki/php-1.36.0-wmf.25/includes/libs/rdbms/database/DBConnRef.php(68): Wikimedia\Rdbms\Database->insert(string, array, string)
#6 /srv/mediawiki/php-1.36.0-wmf.25/includes/libs/rdbms/database/DBConnRef.php(369): Wikimedia\Rdbms\DBConnRef->__call(string, array)
#7 /srv/mediawiki/php-1.36.0-wmf.25/includes/watcheditem/WatchedItemStore.php(1692): Wikimedia\Rdbms\DBConnRef->insert(string, array, string)
#8 /srv/mediawiki/php-1.36.0-wmf.25/includes/deferred/MWCallableUpdate.php(38): WatchedItemStore->{closure}()
#9 /srv/mediawiki/php-1.36.0-wmf.25/includes/deferred/DeferredUpdates.php(467): MWCallableUpdate->doUpdate()
#10 /srv/mediawiki/php-1.36.0-wmf.25/includes/deferred/DeferredUpdates.php(344): DeferredUpdates::attemptUpdate(MWCallableUpdate, Wikimedia\Rdbms\LBFactoryMulti)
#11 /srv/mediawiki/php-1.36.0-wmf.25/includes/deferred/DeferredUpdates.php(278): DeferredUpdates::run(MWCallableUpdate, Wikimedia\Rdbms\LBFactoryMulti, Monolog\Logger, BufferingStatsdDataFactory, string)
#12 /srv/mediawiki/php-1.36.0-wmf.25/includes/deferred/DeferredUpdates.php(194): DeferredUpdates::handleUpdateQueue(array, string, integer)
#13 /srv/mediawiki/php-1.36.0-wmf.25/includes/MediaWiki.php(1118): DeferredUpdates::doUpdates(string)
#14 /srv/mediawiki/php-1.36.0-wmf.25/includes/MediaWiki.php(854): MediaWiki->restInPeace()
#15 /srv/mediawiki/php-1.36.0-wmf.25/includes/MediaWiki.php(866): MediaWiki->{closure}()
#16 /srv/mediawiki/php-1.36.0-wmf.25/includes/MediaWiki.php(587): MediaWiki->doPostOutputShutdown()
#17 /srv/mediawiki/php-1.36.0-wmf.25/index.php(53): MediaWiki->run()
#18 /srv/mediawiki/php-1.36.0-wmf.25/index.php(46): wfIndexMain()
#19 /srv/mediawiki/w/index.php(3): require(string)
#20 {main}
Impact
Notes

Details

Request ID
X-9DkgpAICgAACHBWp8AAAET
Request URL
https://bn.wikipedia.org/w/index.php?title=xxxxxxxx&action=submit

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Krinkle renamed this task from Function: WatchedItemStore::updateExpiriesAfterMove to DBQueryError: Duplicate entry INSERT INTO `watchlist_expiry` (Function: WatchedItemStore::updateExpiriesAfterMove).Jan 13 2021, 7:28 PM
Krinkle set Request URL to https://bn.wikipedia.org/w/index.php?title=xxxxxxxx&action=submit.
Krinkle updated the task description. (Show Details)
Krinkle edited Stack Trace. (Show Details)
Krinkle moved this task from Untriaged to Jan 2021 on the Wikimedia-production-error board.
MusikAnimal subscribed.

I can't repro on my local, but I can reliably on production by moving https://en.wikipedia.org/wiki/User:MusikAnimal/sandbox. The only requirement is at least one person be temporarily watching the page (in my case that's me, the one moving the page). If I'm watching permanently, there's no error (because the other watchers to that page are watching permanently).

However I could not reproduce with https://en.wikipedia.org/wiki/User:MusikAnimal/sandbox7, even when I'm already watching the title of the move target (say it used to live there). I also tried watching with alt accounts to see whether the fact there are multiple watchers has any role.

So all in all, I can't explain what's going on here, but the obvious fix I think is to change the INSERT to have ON DUPLICATE KEY UPDATE, such that it updates the we_expiry with the new expiry.

Impact

The impact of this bug is that it's possible users temporarily watching a page will end up permanently watching the new page, and an orphan row is left in the watchlist_expiry table. However the orphan row will eventually be deleted by WatchedItemStore::removeExpired(). And in my testing, the expiry is still preserved, both for me and others temporarily watching the page, on both the old title and new title. The error is also thrown in a deferred update so the user never sees it.

Change 656251 had a related patch set uploaded (by MusikAnimal; owner: MusikAnimal):
[mediawiki/core@master] WatchedItemStore: Use REPLACE when inserting expiries after page move

https://gerrit.wikimedia.org/r/656251

Change 656251 merged by jenkins-bot:
[mediawiki/core@master] WatchedItemStore: Use REPLACE when inserting expiries after page move

https://gerrit.wikimedia.org/r/656251

dom_walden added subscribers: ifried, dom_walden.

I have not been able to reproduce the error on my local environment (MediaWiki 1.36.0-alpha (20e3a46)).

This change has been on group0 since yesterday. Looking at logstash, I cannot see this error occurring on group0 wikis since then. However, I don't know how to use logstash very well so perhaps someone who knows what they are doing could check.

@ifried I am moving this along because I think we have fixed the thing causing the production error. But, further to the questions for MusikAnimal below, there might be further bugs to be raised.

The impact of this bug is that it's possible users temporarily watching a page will end up permanently watching the new page, and an orphan row is left in the watchlist_expiry table. However the orphan row will eventually be deleted by WatchedItemStore::removeExpired(). And in my testing, the expiry is still preserved, both for me and others temporarily watching the page, on both the old title and new title. The error is also thrown in a deferred update so the user never sees it.

@MusikAnimal Speculating (with no evidence at the moment) as to the cause of this bug, could it be due to slow replication from the master to the replica databases?

If there is already a watchlist entry for $page and you move another page to $page, it replaces the entry in watchlist on master.

If updateExpiriesAfterMove runs a select on the replica database to find out the wl_id of the $page entry, it might get the old wl_id from before the replace.

This means the issue of orphan rows and pages incorrectly watched permanently you mentioned above could still happen, even if the production error does not occur.

I have tried to recreate this on my local environment, testwiki and mediawiki by repeatedly moving an article back and forward (e.g. from Article123 to Article123_moved and back again). But the temporary watch was maintained on both the pages. So I don't know if my hypothesis is actually accurate or possible.

Locally, it left behind a number of orphan watchlist_expiry rows. I guess if there is an existing row in watchlist which we replace we don't remove the old watchlist_expiry rows. But, as you say, the orphan rows will eventually be removed.

Speculating (with no evidence at the moment) as to the cause of this bug, could it be due to slow replication from the master to the replica databases?

If there is already a watchlist entry for $page and you move another page to $page, it replaces the entry in watchlist on master.

If updateExpiriesAfterMove runs a select on the replica database to find out the wl_id of the $page entry, it might get the old wl_id from before the replace.

updateExpiriesAfterMove should only be using DB_MASTER, so I don't think that's it.

No errors in the past week, verified on Logstash. We also removed the filter from mediawiki-new-errors dashboard. Closing as resolved, thanks for the help, all!