See parent task for details. We want to investigate what kinds of groups might make sense for Wikidata / Wikibase DB queries. Some suggestions: client/repo as proposed in T246415, or per site type (wikipedia), or per site (enwiki); API/non-API; terms/non-terms.
Description
Details
| Status | Subtype | Assigned | Task | ||
|---|---|---|---|---|---|
| Resolved | Addshore | T246415 Investigate a different db load groups for wikidata / wikibase | |||
| Resolved | Addshore | T262923 Investigate sensible DB load group splits for Wikidata / Wikibase |
Event Timeline
Change 628316 had a related patch set uploaded (by Ladsgroup; owner: Ladsgroup):
[mediawiki/extensions/Wikibase@master] Introduce and use StatsdMonitoring trait in term store
Change 628316 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] Introduce and use StatsdMonitoring trait in term store
Change 628808 had a related patch set uploaded (by Ladsgroup; owner: Ladsgroup):
[mediawiki/extensions/Wikibase@wmf/1.36.0-wmf.9] Introduce and use StatsdMonitoring trait in term store
Change 628808 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@wmf/1.36.0-wmf.9] Introduce and use StatsdMonitoring trait in term store
Mentioned in SAL (#wikimedia-operations) [2020-09-21T15:50:12Z] <ladsgroup@deploy1001> Synchronized php-1.36.0-wmf.9/extensions/Wikibase/lib/includes/Store/Sql/Terms/Util/StatsdMonitoring.php: [[gerrit:628808|Introduce and use StatsdMonitoring trait in term store (T262923), Part I]] (duration: 00m 59s)
Mentioned in SAL (#wikimedia-operations) [2020-09-21T15:51:36Z] <ladsgroup@deploy1001> Synchronized php-1.36.0-wmf.9/extensions/Wikibase/lib/includes/Store/Sql/Terms/: [[gerrit:628808|Introduce and use StatsdMonitoring trait in term store (T262923), Part I]] (duration: 00m 56s)
Now we have a dashboard for ratio queries coming from client wikis: https://grafana.wikimedia.org/d/000000548/wikibase-sql-term-storage?viewPanel=22&orgId=1&refresh=30s&from=now-3h&to=now
Currently, it varies between 60% to 70%
Playing around with a new panel on grafana and looking at how these statsd things are called from the code.
the statsd measurement of PrefetchingItemTermLookup_prefetchTerms is always followed by DatabaseTermIdsResolver_selectTermsViaJoin
which both seem to be the queries with to have the biggest impacts from client. But using them both in the percentage panel might give a incorrect result, we should probably only use one?
If it's causing two queries and not one, it should be counted twice since we care about the load. Am I missing something obvious?
If it's causing two queries and not one, it should be counted twice since we care about the load. Am I missing something obvious?
It’s not necessarily causing two queries, if I understand correctly. PrefetchingItemTermLookup::prefetchTerms calls DatabaseTermIdsResolver::selectTermsViaJoin (via DatabaseTermIdsResolver::selectTermsViaJoin), so both statsd measurements may correspond to the same single query. But PrefetchingItemTermLookup::prefetchTerms actually calls DatabaseTermIdsResolver::selectTermsViaJoin in a loop, too, so there may be more than one DatabaseTermIdsResolver_selectTermsViaJoin measurement for the same PrefetchingItemTermLookup_prefetchTerms measurement, if I’m not mistaken. But we’re always overcounting by one, I think, and should probably remove the statsd measurement for DatabaseTermIdsResolver_selectTermsViaJoin.
Suggested way forward: stop incrementing the statsd counter for PrefetchingItemTermLookup::prefetchTerms to get the correct result.
Though it doesn’t seem to make a big difference, unless I did it wrong (feel free to review in Grafana):
But I don’t think this is a problem with just PrefetchingItemTermLookup… from a quick git grep, roughly two thirds of $this->incrementForQuery() calls aren’t directly related to a database query, I’d say. If I’m not mistaken, this new StatsdMonitoring trait (which creates the measurements for wikibase.query_contexts.{repo,client},term_store,{query_type}) was introduced by replacing existing calls to statsd, but those existing calls had never been written with the intention to correspond 1:1 to database queries, so now we’re getting skewed data, where the same query might be logged multiple times at different levels.
I'm pretty sure it's not two thirds, the only mistake I found is CachedDatabasePropertyLabelResolver_loadProperties, the rest are related to a query but sometimes it's slightly hidden. Let me clean it up.
Three things are mistakes: PrefetchingPropertyTermLookup_prefetchTerms, PrefetchingItemTermLookup_prefetchTerms, CachedDatabasePropertyLabelResolver_loadProperties. The other nine are correct. We can either remove calls to the trait (but it might break some other metrics) or remove them from the percentage graph (which I hope shouldn't be too hard).
