Page MenuHomePhabricator

Wikidata's wb_items_per_site table has suddenly disappeared, creating DBQueryErrors on page views
Closed, ResolvedPublicPRODUCTION ERROR

Assigned To
Authored By
DannyS712
Apr 6 2020, 11:05 PM
Referenced Files
F31738002: Screenshot_20200406-182953_Chrome.jpg
Apr 7 2020, 1:31 AM
F31737813: Screenshot 2020-04-07 at 01.06.45.png
Apr 6 2020, 11:19 PM
Tokens
"The World Burns" token, awarded by MJL."Like" token, awarded by QEDK."The World Burns" token, awarded by NickK."Like" token, awarded by Volker_E."The World Burns" token, awarded by OldUser01."The World Burns" token, awarded by Mvolz."Like" token, awarded by Dreamy_Jazz."The World Burns" token, awarded by Thibaut120094."Like" token, awarded by Wittylama."Like" token, awarded by ToBeFree."Like" token, awarded by Can_I_Log_In.

Description

Caused by T157651: sql.php must not run LoadExtensionSchemaUpdates and T249598: Wikibase schema updaters must not modify database directly

Incident documentation underway

At 23:00:00 UTC cron from puppet cron/wikibase/dumpwikibaserdf.sh, which calls cron/wikibase/wikibasedumps-shared.sh, which calls sql.php to run an ad-hoc query.

sql.php in turn due to the bug mentioned above, dropped an important table (wb_items_per_site) in production.

Original report:

Error: 1146 Table 'wikidatawiki.wb_items_per_site' doesn't exist (10.64.0.96)

https://meta.wikimedia.org/wiki/Steward_requests/Global [Xou1VgpAMNIAA1mYvDEAAAIS] 2020-04-06 23:03:50: Fatal exception of type "Wikimedia\Rdbms\DBQueryError"

https://meta.wikimedia.org/wiki/Wikipedia_15/Get_Involved/el [73a46ca0-a4d7-4265-86f9-9dc923efb4e1] 2020-04-06 23:04:32: Fatal exception of type "Wikimedia\Rdbms\DBQueryError"

https://en.wikipedia.org/wiki/Main_Page [Xou16QpAAMEAAtSGcnQAAABM] 2020-04-06 23:06:18: Fatal exception of type "Wikimedia\Rdbms\DBQueryError"

Details

Request ID
Xou1VgpAMNIAA1mYvDEAAAIS
Request URL
https://meta.wikimedia.org/wiki/Steward_requests/Global
Stack Trace
2020-04-06 23:03:50 [Xou1VgpAMNIAA1mYvDEAAAIS] mw1368 metawiki 1.35.0-wmf.26 exception ERROR: [Xou1VgpAMNIAA1mYvDEAAAIS] /wiki/SRG   Wikimedia\Rdbms\DBQueryError from line 1619 of /srv/mediawiki/php-1.35.0-wmf.26/includes/libs/rdbms/database/Database.php: A database query error has occurred. Did you forget to run your application's database schema updater after upgrading? 
Query: SELECT  ips_item_id  FROM `wb_items_per_site`    WHERE ips_site_id = 'metawiki' AND ips_site_page = 'Steward requests/Global'  LIMIT 1  
Function: Wikimedia\Rdbms\Database::selectRow
Error: 1146 Table 'wikidatawiki.wb_items_per_site' doesn't exist (10.64.0.96)
 {"exception_id":"Xou1VgpAMNIAA1mYvDEAAAIS","exception_url":"/wiki/SRG","caught_by":"mwe_handler"} 
[Exception Wikimedia\Rdbms\DBQueryError] (/srv/mediawiki/php-1.35.0-wmf.26/includes/libs/rdbms/database/Database.php:1619) A database query error has occurred. Did you forget to run your application's database schema updater after upgrading? 
Query: SELECT  ips_item_id  FROM `wb_items_per_site`    WHERE ips_site_id = 'metawiki' AND ips_site_page = 'Steward requests/Global'  LIMIT 1  
Function: Wikimedia\Rdbms\Database::selectRow
Error: 1146 Table 'wikidatawiki.wb_items_per_site' doesn't exist (10.64.0.96)

  #0 /srv/mediawiki/php-1.35.0-wmf.26/includes/libs/rdbms/database/Database.php(1603): Wikimedia\Rdbms\Database->getQueryException(string, integer, string, string)
  #1 /srv/mediawiki/php-1.35.0-wmf.26/includes/libs/rdbms/database/Database.php(1580): Wikimedia\Rdbms\Database->getQueryExceptionAndLog(string, integer, string, string)
  #2 /srv/mediawiki/php-1.35.0-wmf.26/includes/libs/rdbms/database/Database.php(1159): Wikimedia\Rdbms\Database->reportQueryError(string, integer, string, string, boolean)
  #3 /srv/mediawiki/php-1.35.0-wmf.26/includes/libs/rdbms/database/Database.php(1819): Wikimedia\Rdbms\Database->query(string, string)
  #4 /srv/mediawiki/php-1.35.0-wmf.26/includes/libs/rdbms/database/Database.php(1915): Wikimedia\Rdbms\Database->select(string, array, array, string, array, array)
  #5 /srv/mediawiki/php-1.35.0-wmf.26/includes/libs/rdbms/database/DBConnRef.php(68): Wikimedia\Rdbms\Database->selectRow(string, array, array)
  #6 /srv/mediawiki/php-1.35.0-wmf.26/includes/libs/rdbms/database/DBConnRef.php(331): Wikimedia\Rdbms\DBConnRef->__call(string, array)
  #7 /srv/mediawiki/php-1.35.0-wmf.26/extensions/Wikibase/lib/includes/Store/Sql/SiteLinkTable.php(266): Wikimedia\Rdbms\DBConnRef->selectRow(string, array, array)
  #8 /srv/mediawiki/php-1.35.0-wmf.26/extensions/Wikibase/lib/includes/Store/CachingSiteLinkLookup.php(147): Wikibase\Lib\Store\Sql\SiteLinkTable->getItemIdForLink(string, string)
  #9 /srv/mediawiki/php-1.35.0-wmf.26/extensions/Wikibase/lib/includes/Store/CachingSiteLinkLookup.php(75): Wikibase\Lib\Store\CachingSiteLinkLookup->getAndCacheItemIdForLink(string, string)
  #10 /srv/mediawiki/php-1.35.0-wmf.26/extensions/Wikibase/client/includes/LangLinkHandler.php(101): Wikibase\Lib\Store\CachingSiteLinkLookup->getItemIdForLink(string, string)
  #11 /srv/mediawiki/php-1.35.0-wmf.26/extensions/Wikibase/client/includes/LangLinkHandler.php(332): Wikibase\Client\LangLinkHandler->getEntityLinks(Title)
  #12 /srv/mediawiki/php-1.35.0-wmf.26/extensions/Wikibase/client/includes/LangLinkHandler.php(353): Wikibase\Client\LangLinkHandler->getEffectiveRepoLinks(Title, ParserOutput)
  #13 /srv/mediawiki/php-1.35.0-wmf.26/extensions/Wikibase/client/includes/Hooks/ParserOutputUpdateHookHandlers.php(97): Wikibase\Client\LangLinkHandler->addLinksFromRepository(Title, ParserOutput)
  #14 /srv/mediawiki/php-1.35.0-wmf.26/extensions/Wikibase/client/includes/Hooks/ParserOutputUpdateHookHandlers.php(65): Wikibase\Client\Hooks\ParserOutputUpdateHookHandlers->doContentAlterParserOutput(Title, ParserOutput)
  #15 /srv/mediawiki/php-1.35.0-wmf.26/includes/Hooks.php(174): Wikibase\Client\Hooks\ParserOutputUpdateHookHandlers::onContentAlterParserOutput(WikitextContent, Title, ParserOutput)
  #16 /srv/mediawiki/php-1.35.0-wmf.26/includes/Hooks.php(202): Hooks::callHook(string, array, array, NULL)
  #17 /srv/mediawiki/php-1.35.0-wmf.26/includes/content/AbstractContent.php(569): Hooks::run(string, array)
  #18 /srv/mediawiki/php-1.35.0-wmf.26/includes/Revision/RenderedRevision.php(267): AbstractContent->getParserOutput(Title, integer, ParserOptions, boolean)
  #19 /srv/mediawiki/php-1.35.0-wmf.26/includes/Revision/RenderedRevision.php(236): MediaWiki\Revision\RenderedRevision->getSlotParserOutputUncached(WikitextContent, boolean)
  #20 /srv/mediawiki/php-1.35.0-wmf.26/includes/Revision/RevisionRenderer.php(215): MediaWiki\Revision\RenderedRevision->getSlotParserOutput(string)
  #21 /srv/mediawiki/php-1.35.0-wmf.26/includes/Revision/RevisionRenderer.php(152): MediaWiki\Revision\RevisionRenderer->combineSlotOutput(MediaWiki\Revision\RenderedRevision, array)
  #22 [internal function]: MediaWiki\Revision\RevisionRenderer->MediaWiki\Revision\{closure}(MediaWiki\Revision\RenderedRevision, array)
  #23 /srv/mediawiki/php-1.35.0-wmf.26/includes/Revision/RenderedRevision.php(198): call_user_func(Closure, MediaWiki\Revision\RenderedRevision, array)
  #24 /srv/mediawiki/php-1.35.0-wmf.26/includes/poolcounter/PoolWorkArticleView.php(196): MediaWiki\Revision\RenderedRevision->getRevisionParserOutput()
  #25 /srv/mediawiki/php-1.35.0-wmf.26/includes/poolcounter/PoolCounterWork.php(125): PoolWorkArticleView->doWork()
  #26 /srv/mediawiki/php-1.35.0-wmf.26/includes/page/Article.php(787): PoolCounterWork->execute()
  #27 /srv/mediawiki/php-1.35.0-wmf.26/includes/actions/ViewAction.php(66): Article->view()
  #28 /srv/mediawiki/php-1.35.0-wmf.26/includes/MediaWiki.php(519): ViewAction->show()
  #29 /srv/mediawiki/php-1.35.0-wmf.26/includes/MediaWiki.php(305): MediaWiki->performAction(Article, Title)
  #30 /srv/mediawiki/php-1.35.0-wmf.26/includes/MediaWiki.php(973): MediaWiki->performRequest()
  #31 /srv/mediawiki/php-1.35.0-wmf.26/includes/MediaWiki.php(535): MediaWiki->main()
  #32 /srv/mediawiki/php-1.35.0-wmf.26/index.php(47): MediaWiki->run()
  #33 /srv/mediawiki/w/index.php(3): require(string)
  #34 {main}
Show related patches Customize query in gerrit

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Change 587231 merged by jenkins-bot:
[operations/mediawiki-config@master] RejectParserCacheValue entries during wb_items_per_site drop incident

https://gerrit.wikimedia.org/r/587231

Mentioned in SAL (#wikimedia-operations) [2020-04-07T13:55:21Z] <addshore@deploy1001> sync-file aborted: T249565 T249595 RejectParserCacheValue entries during wb_items_per_site drop incident (1h) (duration: 00m 29s)

Mentioned in SAL (#wikimedia-operations) [2020-04-07T13:57:04Z] <addshore@deploy1001> Synchronized wmf-config/CommonSettings.php: REVERT T249565 T249595 RejectParserCacheValue entries during wb_items_per_site drop incident (1h) (duration: 00m 58s)

Change 587264 had a related patch set uploaded (by Addshore; owner: Addshore):
[operations/mediawiki-config@master] RejectParserCacheValue entries during wb_items_per_site drop incident 2

https://gerrit.wikimedia.org/r/587264

Change 587264 merged by jenkins-bot:
[operations/mediawiki-config@master] RejectParserCacheValue entries during wb_items_per_site drop incident 2

https://gerrit.wikimedia.org/r/587264

Mentioned in SAL (#wikimedia-operations) [2020-04-07T14:08:03Z] <addshore@deploy1001> Synchronized wmf-config/CommonSettings.php: T249565 T249595 RejectParserCacheValue entries during wb_items_per_site drop incident (1h) take 2 (duration: 00m 57s)

Change 587267 had a related patch set uploaded (by DannyS712; owner: DannyS712):
[operations/mediawiki-config@master] RejectParserCacheValue entries during wb_items_per_site drop incident: namespace check

https://gerrit.wikimedia.org/r/587267

Change 587267 abandoned by DannyS712:
RejectParserCacheValue entries during wb_items_per_site drop incident: namespace check

https://gerrit.wikimedia.org/r/587267

Change 587268 had a related patch set uploaded (by Addshore; owner: Addshore):
[operations/mediawiki-config@master] RejectParserCache entries for wb_items_per_site 2/14.5 hours

https://gerrit.wikimedia.org/r/587268

Change 587268 merged by jenkins-bot:
[operations/mediawiki-config@master] RejectParserCache entries for wb_items_per_site 2/14.5 hours

https://gerrit.wikimedia.org/r/587268

Mentioned in SAL (#wikimedia-operations) [2020-04-07T14:15:52Z] <addshore@deploy1001> Synchronized wmf-config/CommonSettings.php: T249565 T249595 RejectParserCacheValue entries during wb_items_per_site drop incident (2/14.5h) (duration: 00m 58s)

Change 587271 had a related patch set uploaded (by Addshore; owner: Addshore):
[operations/mediawiki-config@master] RejectParserCache entries for wb_items_per_site 4/14.5 hours

https://gerrit.wikimedia.org/r/587271

Change 587271 merged by jenkins-bot:
[operations/mediawiki-config@master] RejectParserCache entries for wb_items_per_site 4/14.5 hours

https://gerrit.wikimedia.org/r/587271

Mentioned in SAL (#wikimedia-operations) [2020-04-07T14:25:34Z] <addshore@deploy1001> Synchronized wmf-config/CommonSettings.php: T249565 T249595 RejectParserCacheValue entries during wb_items_per_site drop incident (4/14.5h) (duration: 00m 58s)

Change 587273 had a related patch set uploaded (by Addshore; owner: Addshore):
[operations/mediawiki-config@master] RejectParserCache entries for wb_items_per_site 8/14.5 hours

https://gerrit.wikimedia.org/r/587273

Change 587273 merged by jenkins-bot:
[operations/mediawiki-config@master] RejectParserCache entries for wb_items_per_site 8/14.5 hours

https://gerrit.wikimedia.org/r/587273

Mentioned in SAL (#wikimedia-operations) [2020-04-07T14:35:02Z] <addshore@deploy1001> Synchronized wmf-config/CommonSettings.php: T249565 T249595 RejectParserCacheValue entries during wb_items_per_site drop incident (8/14.5h) (duration: 00m 58s)

Change 587277 had a related patch set uploaded (by Addshore; owner: Addshore):
[operations/mediawiki-config@master] RejectParserCache entries for wb_items_per_site 10/14.5 hours

https://gerrit.wikimedia.org/r/587277

Change 587277 merged by jenkins-bot:
[operations/mediawiki-config@master] RejectParserCache entries for wb_items_per_site 10/14.5 hours

https://gerrit.wikimedia.org/r/587277

Mentioned in SAL (#wikimedia-operations) [2020-04-07T14:56:27Z] <addshore@deploy1001> Synchronized wmf-config/CommonSettings.php: T249565 T249595 RejectParserCacheValue entries during wb_items_per_site drop incident (10/14.5h) (duration: 00m 55s)

Change 587280 had a related patch set uploaded (by Addshore; owner: Addshore):
[operations/mediawiki-config@master] RejectParserCache entries for wb_items_per_site 12/14.5 hours

https://gerrit.wikimedia.org/r/587280

Change 587280 merged by jenkins-bot:
[operations/mediawiki-config@master] RejectParserCache entries for wb_items_per_site 12/14.5 hours

https://gerrit.wikimedia.org/r/587280

Mentioned in SAL (#wikimedia-operations) [2020-04-07T15:17:52Z] <addshore@deploy1001> Synchronized wmf-config/CommonSettings.php: T249565 T249595 RejectParserCacheValue entries during wb_items_per_site drop incident (12/14.5h) (duration: 01m 00s)

Some logged out users will still see potentially outdated pages cached in varnish for a time.

Adding that varnish TTL is 24 hours, so logged out users might see wrong data for the next 24 hours.

@Ladsgroup you sure? I thought that the s-maxage sent to Varnish was 2 weeks:
< cache-control: s-maxage=1209600, must-revalidate, max-age=0

OK, so.

  • Table on Wikidata is correct.
  • Render of pages on Wikidata is correct.
  • Render of pages on Wikidata clients is being progressively fixed.

Can we declare this no longer UBN / a train blocker?

Is it the time to unblock EmausBot?

Is it the time to unblock EmausBot?

I don't think so, at least for now; better to err on the side of caution

Just for the record, as the admin who performed the block, I have no objection to any other admin unblocking, once there's agreement that it's appropriate to do so.

I think people that worked for a long time since yesterday are taking a well-deserved rest and may be unresponsive. From the data recovery/database perspective everything is done and no longer UBN. But I think it should be someone familiar with Wikidata that should give the last ok, as there may be those cache issues still affecting some pages served, with a long tail.

Just for the record, as the admin who performed the block, I have no objection to any other admin unblocking, once there's agreement that it's appropriate to do so.

...I was the one who blocked the bot https://www.wikidata.org/w/index.php?title=Special:Log/block&page=User%3AEmausBot

I think Roy was talking about enwp:
https://en.wikipedia.org/w/index.php?title=Special:Log&page=User%3AEmausBot&type=block

An 'all-clear' from the Wikidata side would be useful. I've pulled the plug on Pi bot to avoid that causing problems, would be good to know when I can plug it back in.

Oh, I blocked it on enwiki. I didn't realize that was the wrong place. Should I just unblock it there?

Oh, I blocked it on enwiki. I didn't realize that was the wrong place. Should I just unblock it there?

On enwiki it just fixes double redirects, so it should be fine there and I suggest unblocking

Change 587303 had a related patch set uploaded (by Addshore; owner: Addshore):
[operations/mediawiki-config@master] RejectParserCache entries for wb_items_per_site 14.5/14.5 hours

https://gerrit.wikimedia.org/r/587303

Addshore lowered the priority of this task from Unbreak Now! to High.Apr 7 2020, 5:52 PM

I would not consider this a train blocker any more

Change 587303 merged by jenkins-bot:
[operations/mediawiki-config@master] RejectParserCache entries for wb_items_per_site 14.5/14.5 hours

https://gerrit.wikimedia.org/r/587303

Mentioned in SAL (#wikimedia-operations) [2020-04-07T17:54:03Z] <addshore@deploy1001> sync-file aborted: T249565 T249595 RejectParserCacheValue entries during wb_items_per_site drop incident (14.5/14.5h) (duration: 01m 16s)

On enwiki it just fixes double redirects, so it should be fine there and I suggest unblocking

I'll hold off on unblocking for now. Fixing double-redirects seems really low on the priority list compared to making sure everything else really is stable.

Mentioned in SAL (#wikimedia-operations) [2020-04-07T17:55:21Z] <addshore@deploy1001> Synchronized wmf-config/CommonSettings.php: T249565 T249595 RejectParserCacheValue entries during wb_items_per_site drop incident (14.5/14.5h) retry (duration: 01m 02s)

Some logged out users will still see potentially outdated pages cached in varnish for a time.

Adding that varnish TTL is 24 hours, so logged out users might see wrong data for the next 24 hours.

@Ladsgroup you sure? I thought that the s-maxage sent to Varnish was 2 weeks:
< cache-control: s-maxage=1209600, must-revalidate, max-age=0

Per https://wikitech.wikimedia.org/wiki/Varnish#TTL varnish cache TTL is 24 hours
Also for text listed in https://github.com/wikimedia/puppet/blob/production/hieradata/role/common/cache/text.yaml#L197

I would not consider this a train blocker any more

Removing from blocker, then.

Mentioned in SAL (#wikimedia-operations) [2020-04-07T19:45:21Z] <hoo> Temporary modified dumpsgen's crontab on snapshot1008 so that the Wikidata RDF dumps start now (broke as a side effect of T249565)

Mentioned in SAL (#wikimedia-operations) [2020-04-07T20:08:58Z] <hoo> (Take 2) Temporary modified dumpsgen's crontab on snapshot1008 so that the Wikidata RDF dumps start now (broke as a side effect of T249565)

Mentioned in SAL (#wikimedia-operations) [2020-04-07T20:34:22Z] <hoo> (Take 3) Temporary modified dumpsgen's crontab on snapshot1008 so that the Wikidata RDF dumps start now (broke as a side effect of T249565)

Change 587389 had a related patch set uploaded (by Tim Starling; owner: Tim Starling):
[mediawiki/core@master] Revert "maintenance: Remove sql.php temporarily"

https://gerrit.wikimedia.org/r/587389

Some logged out users will still see potentially outdated pages cached in varnish for a time.

Adding that varnish TTL is 24 hours, so logged out users might see wrong data for the next 24 hours.

@Ladsgroup you sure? I thought that the s-maxage sent to Varnish was 2 weeks:
< cache-control: s-maxage=1209600, must-revalidate, max-age=0

Per https://wikitech.wikimedia.org/wiki/Varnish#TTL varnish cache TTL is 24 hours
Also for text listed in https://github.com/wikimedia/puppet/blob/production/hieradata/role/common/cache/text.yaml#L197

It has changed recently, for example see https://gerrit.wikimedia.org/r/c/operations/puppet/+/352826/

Perhaps this will be interesting to some people.

A Hebrew Wikipedia user complained that adapting the category https://en.wikipedia.org/wiki/Category:Mountains_of_Graham_Land to Hebrew didn't work. I checked the sitelinks, and it was correctly linked on the category page, but on the category's history page in the Hebrew Wikipedia there is a "Wikidata item" link in the sidebar under "Tools", and it pointed to the deleted duplicate item https://www.wikidata.org/wiki/Q89608061 . I guess there is some stale caching. I don't see this link in the English Wikipedia.

@Amire80 Indeed, this is due to caching. Usually, purging the page or a null edit fixes the problem.

Can this be done automatically to all the affected pages?

It will be done automatically, eventually. To speed up the process on a particular wiki, something like https://en.wikipedia.org/wiki/User:Joe%27s_Null_Bot or pywikibot's touch.py could be used, subject to local bot policies.

Change 587389 merged by jenkins-bot:
[mediawiki/core@master] Revert "maintenance: Remove sql.php temporarily"

https://gerrit.wikimedia.org/r/587389

Is it the time to unblock EmausBot?

I don't think so, at least for now; better to err on the side of caution

Now? I restarted pi bot yesterday, no reported problems from that. I've unblocked EmausBot on enwp, but left the block in place on wikidata for now.

I've unblocked on wikidata, and will try to keep an eye on its actions

Can this task be closed is there anything else left? Apart from finishing the IR?