Page MenuHomePhabricator

Investigate sensible DB load group splits for Wikidata / Wikibase
Closed, ResolvedPublic

Description

See parent task for details. We want to investigate what kinds of groups might make sense for Wikidata / Wikibase DB queries. Some suggestions: client/repo as proposed in T246415, or per site type (wikipedia), or per site (enwiki); API/non-API; terms/non-terms.

Event Timeline

Change 628316 had a related patch set uploaded (by Ladsgroup; owner: Ladsgroup):
[mediawiki/extensions/Wikibase@master] Introduce and use StatsdMonitoring trait in term store

https://gerrit.wikimedia.org/r/628316

Change 628316 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] Introduce and use StatsdMonitoring trait in term store

https://gerrit.wikimedia.org/r/628316

Change 628808 had a related patch set uploaded (by Ladsgroup; owner: Ladsgroup):
[mediawiki/extensions/Wikibase@wmf/1.36.0-wmf.9] Introduce and use StatsdMonitoring trait in term store

https://gerrit.wikimedia.org/r/628808

Change 628808 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@wmf/1.36.0-wmf.9] Introduce and use StatsdMonitoring trait in term store

https://gerrit.wikimedia.org/r/628808

Mentioned in SAL (#wikimedia-operations) [2020-09-21T15:50:12Z] <ladsgroup@deploy1001> Synchronized php-1.36.0-wmf.9/extensions/Wikibase/lib/includes/Store/Sql/Terms/Util/StatsdMonitoring.php: [[gerrit:628808|Introduce and use StatsdMonitoring trait in term store (T262923), Part I]] (duration: 00m 59s)

Mentioned in SAL (#wikimedia-operations) [2020-09-21T15:51:36Z] <ladsgroup@deploy1001> Synchronized php-1.36.0-wmf.9/extensions/Wikibase/lib/includes/Store/Sql/Terms/: [[gerrit:628808|Introduce and use StatsdMonitoring trait in term store (T262923), Part I]] (duration: 00m 56s)

Playing around with a new panel on grafana and looking at how these statsd things are called from the code.

the statsd measurement of PrefetchingItemTermLookup_prefetchTerms is always followed by DatabaseTermIdsResolver_selectTermsViaJoin

which both seem to be the queries with to have the biggest impacts from client. But using them both in the percentage panel might give a incorrect result, we should probably only use one?

Playing around with a new panel on grafana and looking at how these statsd things are called from the code.

the statsd measurement of PrefetchingItemTermLookup_prefetchTerms is always followed by DatabaseTermIdsResolver_selectTermsViaJoin

which both seem to be the queries with to have the biggest impacts from client. But using them both in the percentage panel might give a incorrect result, we should probably only use one?

If it's causing two queries and not one, it should be counted twice since we care about the load. Am I missing something obvious?

If it's causing two queries and not one, it should be counted twice since we care about the load. Am I missing something obvious?

It’s not necessarily causing two queries, if I understand correctly. PrefetchingItemTermLookup::prefetchTerms calls DatabaseTermIdsResolver::selectTermsViaJoin (via DatabaseTermIdsResolver::selectTermsViaJoin), so both statsd measurements may correspond to the same single query. But PrefetchingItemTermLookup::prefetchTerms actually calls DatabaseTermIdsResolver::selectTermsViaJoin in a loop, too, so there may be more than one DatabaseTermIdsResolver_selectTermsViaJoin measurement for the same PrefetchingItemTermLookup_prefetchTerms measurement, if I’m not mistaken. But we’re always overcounting by one, I think, and should probably remove the statsd measurement for DatabaseTermIdsResolver_selectTermsViaJoin.

Suggested way forward: stop incrementing the statsd counter for PrefetchingItemTermLookup::prefetchTerms to get the correct result.

Though it doesn’t seem to make a big difference, unless I did it wrong (feel free to review in Grafana):

Screenshot_2020-09-29 Wikibase SQL Term Storage - Grafana.png (390×1 px, 54 KB)

But I don’t think this is a problem with just PrefetchingItemTermLookup… from a quick git grep, roughly two thirds of $this->incrementForQuery() calls aren’t directly related to a database query, I’d say. If I’m not mistaken, this new StatsdMonitoring trait (which creates the measurements for wikibase.query_contexts.{repo,client},term_store,{query_type}) was introduced by replacing existing calls to statsd, but those existing calls had never been written with the intention to correspond 1:1 to database queries, so now we’re getting skewed data, where the same query might be logged multiple times at different levels.

I'm pretty sure it's not two thirds, the only mistake I found is CachedDatabasePropertyLabelResolver_loadProperties, the rest are related to a query but sometimes it's slightly hidden. Let me clean it up.

Three things are mistakes: PrefetchingPropertyTermLookup_prefetchTerms, PrefetchingItemTermLookup_prefetchTerms, CachedDatabasePropertyLabelResolver_loadProperties. The other nine are correct. We can either remove calls to the trait (but it might break some other metrics) or remove them from the percentage graph (which I hope shouldn't be too hard).

Addshore claimed this task.
Addshore subscribed.

Marking as resolved, as the investigation is done