Page MenuHomePhabricator

Wikibase\Lib\Store\Sql\SiteLinkTable::getLinks can take over a minute to execute
Closed, ResolvedPublic5 Estimated Story Points

Description

This weekend, while an ongoing incident was being handled, I checked and saw several badly performing queries running. These didn't have (I believe) any relation with the incident, but any web request that takes more than 1 or a few seconds (in the worse possible case) is worrying, as they can pile up and create load issues on the database.

While there was several queries with bad performance (mostly related to recentchanges and API), the worst occurring query right now is:

SELECT /* Wikibase\Lib\Store\Sql\SiteLinkTable::getLinks */ ips_site_id, ips_site_page, ips_item_id FROM `wb_items_per_site` WHERE ips_site_id = '?';

where '?' can be any production wiki, but in particular, for enwiki, it selects over 8 million rows. We saw over 30 of those, over all s8 servers, today Mar 8 at 8:18 UTC.

This not only takes over 1 minute to run, but is likely to be not useful due to the amount of memory consumed.
While reading large number of rows can be useful for something like a dump/analytics, it is unlikely to be useful for a web request. Moreso, the application server will struggle to handle 2GB of data at a time, even in memory, that this query returns.

Is it possible some extra WHERE or LIMIT was missing from the query?

Acceptance Criteria🏕️🌟 :

  • There are no unlimited selects happening for this query

Event Timeline

Offtopic- and feel free to PM in private. What is a good way to report database-related issues to wikidata development team? I am a bit intimidated by the amount of tags and dashboards (which will probably reflect your internal organization, but I am not too familiar with it). I trust you already that any tag (even incorrect!) I use always end up being attended (and I am happy with it, thank you!)- but please provide feedback if I can make your life easier for you when triaging (who to CC? tags to use?, etc.)

Thank you!

Change 705649 had a related patch set uploaded (by Lucas Werkmeister (WMDE); author: Lucas Werkmeister (WMDE)):

[mediawiki/extensions/Wikibase@master] Check for empty array in ImplicitDescriptionUsageLookup

https://gerrit.wikimedia.org/r/705649

Change 705652 had a related patch set uploaded (by Lucas Werkmeister (WMDE); author: Lucas Werkmeister (WMDE)):

[mediawiki/extensions/Wikibase@master] Log warning when querying for all links of sites

https://gerrit.wikimedia.org/r/705652

Change 705668 had a related patch set uploaded (by Lucas Werkmeister (WMDE); author: Lucas Werkmeister (WMDE)):

[mediawiki/extensions/Wikibase@master] Use null to mark no conditions in SiteLinkLookup

https://gerrit.wikimedia.org/r/705668

Alright, I uploaded two separate patch set collections:

Either of them can be merged, but I don’t think merging both would make sense.

Change 705680 had a related patch set uploaded (by Lucas Werkmeister (WMDE); author: Lucas Werkmeister (WMDE)):

[mediawiki/extensions/ArticlePlaceholder@master] Remove empty array check before SiteLinkLookup call

https://gerrit.wikimedia.org/r/705680

Addshore triaged this task as Medium priority.Jul 20 2021, 2:47 PM
Addshore set the point value for this task to 5.Jul 21 2021, 10:32 AM

Guess I should claim this if I already worked on it.

Disclaimer: especially for the first change (“Check for empty array in ImplicitDescriptionUsageLookup”), I’m only assuming that this is where the bad call came from. I looked through the getLinks() callers, and this seemed to be the only one where the argument was a possibly-empty array (and not something like [ $singleItemId->getNumericId() ]) which wasn’t checked for emptiness. But I might have missed something, or I could be totally wrong about what caused this issue.

Change 705652 merged by jenkins-bot:

[mediawiki/extensions/Wikibase@master] Log warning when querying for all links of sites

https://gerrit.wikimedia.org/r/705652

Change 705652 merged by jenkins-bot:

[mediawiki/extensions/Wikibase@master] Log warning when querying for all links of sites

https://gerrit.wikimedia.org/r/705652

@Addshore does that mean you want to keep the footgunny interface?

Well, since the logging was deployed, we can at least look at it in Logstash:

Screenshot 2021-08-03 at 11-28-35 mediawiki-warnings - Elastic.png (194×676 px, 9 KB)

That’s 7006 events over the past 7 days. The logged sites always seem to be a single-element array with the site the request was from, including at least one log event (probably many more) where it was querying for all links to enwiki 😱

The stack trace indicates this is indeed from ImplicitDescriptionUsageLookup:

exception.trace
from /srv/mediawiki/php-1.37.0-wmf.16/extensions/Wikibase/lib/includes/Store/Sql/SiteLinkTable.php(307)
#0 /srv/mediawiki/php-1.37.0-wmf.16/extensions/Wikibase/lib/includes/Store/CachingSiteLinkLookup.php(93): Wikibase\Lib\Store\Sql\SiteLinkTable->getLinks(array, array, array)
#1 /srv/mediawiki/php-1.37.0-wmf.16/extensions/Wikibase/client/includes/Usage/ImplicitDescriptionUsageLookup.php(214): Wikibase\Lib\Store\CachingSiteLinkLookup->getLinks(array, array)
#2 /srv/mediawiki/php-1.37.0-wmf.16/extensions/Wikibase/client/includes/Usage/ImplicitDescriptionUsageLookup.php(137): Wikibase\Client\Usage\ImplicitDescriptionUsageLookup->findImplicitUsages(array)
#3 /srv/mediawiki/php-1.37.0-wmf.16/extensions/Wikibase/client/includes/Changes/AffectedPagesFinder.php(378): Wikibase\Client\Usage\ImplicitDescriptionUsageLookup->getPagesUsing(array, array)
#4 /srv/mediawiki/php-1.37.0-wmf.16/extensions/Wikibase/client/includes/Changes/AffectedPagesFinder.php(295): Wikibase\Client\Changes\AffectedPagesFinder->transformAllPageEntityUsages(Generator, Wikibase\DataModel\Entity\PropertyId, array)
#5 /srv/mediawiki/php-1.37.0-wmf.16/extensions/Wikibase/client/includes/Changes/AffectedPagesFinder.php(95): Wikibase\Client\Changes\AffectedPagesFinder->filterUpdates(Generator)
#6 /srv/mediawiki/php-1.37.0-wmf.16/extensions/Wikibase/client/includes/Changes/ChangeHandler.php(125): Wikibase\Client\Changes\AffectedPagesFinder->getAffectedUsagesByPage(Wikibase\Lib\Changes\EntityChange)
#7 /srv/mediawiki/php-1.37.0-wmf.16/extensions/Wikibase/client/includes/Changes/ChangeHandler.php(101): Wikibase\Client\Changes\ChangeHandler->handleChange(Wikibase\Lib\Changes\EntityChange, array)
#8 /srv/mediawiki/php-1.37.0-wmf.16/extensions/Wikibase/client/includes/ChangeNotificationJob.php(113): Wikibase\Client\Changes\ChangeHandler->handleChanges(array, array)
#9 /srv/mediawiki/php-1.37.0-wmf.16/extensions/EventBus/includes/JobExecutor.php(79): Wikibase\Client\ChangeNotificationJob->run()
#10 /srv/mediawiki/rpc/RunSingleJob.php(76): MediaWiki\Extension\EventBus\JobExecutor->execute(array)
#11 {main}

Change 705652 merged by jenkins-bot:

[mediawiki/extensions/Wikibase@master] Log warning when querying for all links of sites

https://gerrit.wikimedia.org/r/705652

@Addshore does that mean you want to keep the footgunny interface?

No, just that I thought the logging would be interesting to see!

Change 705668 merged by jenkins-bot:

[mediawiki/extensions/Wikibase@master] Use null to mark no conditions in SiteLinkLookup

https://gerrit.wikimedia.org/r/705668

Change 705649 abandoned by Lucas Werkmeister (WMDE):

[mediawiki/extensions/Wikibase@master] Check for empty array in ImplicitDescriptionUsageLookup

Reason:

I1572fa4d23 was done instead

https://gerrit.wikimedia.org/r/705649

Change 705680 merged by jenkins-bot:

[mediawiki/extensions/ArticlePlaceholder@master] Remove empty array check before SiteLinkLookup call

https://gerrit.wikimedia.org/r/705680

After wmf.18 is deployed, we can verify that the warning doesn’t get logged anymore, and/or that the expensive queries are gone.

(I don’t think backporting this change is worth the effort, it touches too many files.)