https://discovery.wmflabs.org/wdqs/#wdqs_usage is not updating since August 10th. Can we bring it back to life?
Description
Details
operations/puppet : production | Add chelsyx to analytics-search-users group |
wikimedia/discovery/golden : master | Make generated reports be owned by analytics-search-users group |
operations/puppet : production | Add analytics-search system user to analytics-search-users group |
wikimedia/discovery/golden : master | Change which partition WDQS requests are counted from |
Status | Assigned | Task | ||
---|---|---|---|---|
Resolved | mpopov | T204415 Query stats dashboard not updating | ||
Resolved | mpopov | T205441 'group' parameter in Reportupdater for automatic chgrp of generated reports |
Event Timeline
Hi @Smalyshev , the dashboard is updating. But since August 10th, the SPARQL usage number is very small (even 0 for certain days) and the LDF usage number is 0. Did we change the URI of the endpoint?
Query:
SELECT year, month, day, uri_path, COUNT(*) AS events FROM webrequest WHERE webrequest_source = 'misc' AND year = 2018 AND month = 8 AND day > 9 AND uri_host = 'query.wikidata.org' AND uri_path IN('/', '/bigdata/namespace/wdq/sparql', '/bigdata/ldf', '/sparql') GROUP BY year, month, day, uri_path ORDER BY year, month, day LIMIT 10000
But since August 10th, the SPARQL usage number is very small (even 0 for certain days)
That can not be, the actual usage is definitely not 0. I see a lot of accesses on turnilo.wikimedia.org and in kibana. See for example: https://logstash.wikimedia.org/goto/74e376f55fcdc3b93e4a7232cfa5203a
Maybe something changed in request logging?
Did we change the URI of the endpoint?
No, the endpoint name is the same.
Hi @Nuria we noticed that since August 10th, the SPARQL usage number is very small (see query in T204415#4590108), which is much less than what we saw in logstash: https://logstash.wikimedia.org/goto/74e376f55fcdc3b93e4a7232cfa5203a
Do you know any incident of webrequest that might cause this?
Yep exactly, cache misc (where query.wikidata.org was hosted) has been migrated to cache text, therefore all the Hive queries (and related) should be using 'webrequest_text' from now on.
See also T200822: Remove webrequest misc analytics related jobs and code after cache misc -> text merge is complete and T164609: Merge cache_misc into cache_text functionally. Sorry yall didn't know about this. I wonder if there is a better way we can configure datasources like this more dynamically. Hm.
all the Hive queries (and related) should be using 'webrequest_text' from now on.
e.g. WHERE webrequest_source = 'text'
Assigned to @mpopov Again, our apologies that the data sources are hardcoded like this. As I mentioned on our meeting abetter path to go forward would be using the tags for wdqs to identify the requests: https://github.com/wikimedia/analytics-refinery-source/blob/master/refinery-core/src/main/java/org/wikimedia/analytics/refinery/core/webrequest/tag/WDQSTagger.java
Thanks for looking into it, @Nuria! And for confirming, @elukey @Ottomata! :)
A note for Operations: this is not the first time we've encountered an issue like this. Last year our query for Maps usage stopped working because of partition changes that we weren't told of (T167083), and this is exactly like that. Nobody on Product-Analytics is subscribed to ops@lists.wikimedia (because 99.999% of those threads would be irrelevant to us), so I just want to point out that the decisions made by Ops that affect data sources like wmf.webrequest table need to be communicated to analysts who rely on those data sources.
I don't think it's reasonable to expect, say, @Gehel to notice those emails in his mailbox and notify us, so I suggest that when authoring emails announcing big, data source-related changes like partition drops and renames, please cc product-analytics@wikimedia.org since we have scripts and queries that operate on those data sources under certain hard-coded assumptions.
BTW query has to filter by path anyway because it also counts WDQS homepage visits so we're not switching to tags in this case.
Change 462577 had a related patch set uploaded (by Bearloga; owner: Bearloga):
[wikimedia/discovery/golden@master] Change which partition WDQS requests are counted from
Change 462577 merged by Bearloga:
[wikimedia/discovery/golden@master] Change which partition WDQS requests are counted from
@Ottomata @Gehel: I tried editing stat1005:/srv/published-datasets/discovery/metrics/wdqs/basic_usage.tsv but couldn't because the file belongs to group analytics-search, not analytics-search-users (which I belong to) and that sort of makes sense because of how we have it configured right now in statistics::discovery:
$user = 'analytics-search' $group ='analytics-privatedata-users' ... cron { 'wikimedia-discovery-golden': ensure => present, command => "cd ${dir}/golden && sh main.sh >> ${log_dir}/golden-daily.log 2>&1", hour => '5', minute => '0', require => [ Class['::statistics::compute'], Git::Clone['wikimedia/discovery/golden'], Mysql::Config::Client['discovery-stats'] ], user => $user, }
and main.sh in wikimedia/discovery/golden repo that generates these datasets:
# files created / touched by report updater need to be rw for user and group umask 002
From Puppet 3.8 documentation for cron, it's not clear whether we can…somehow set a group? (Would that even make sense?)
I need to edit that file to erase all request counts affected by the 'misc' partition drop that we can recount from the 'text' partition.
@mpopov, since that file is managed by Puppet, you'll have to make a puppet patch to change it!
Oh sorry, misunderstood. Yes we should be able to make the output of the file writable by you somehow.
Change 462580 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Add analytics-search system user to analytics-search-users group
Change 462580 merged by Ottomata:
[operations/puppet@production] Add analytics-search system user to analytics-search-users group
Ok, I've added the analytics-search system user to the analytics-search-users group. You should make your script chgrp analytics-search-users <file> after it creates it.
WMDE wants to pull some data from this dashboard for some internal reporting (data needed for september).
Any ETA on the fix and backfill? :)
Thank you very much, Andrew! That's gonna need to be done with T205441, which I've started on. That's step 1, which I'll need @mforns's help with CR and enabling the parameter to be specified in the defaults section of the YAML config.
Step 2 is Chelsy/me updating the configs to specify the analytics-search-users group and updating the Reportupdater submodule in golden to the patched version.
Step 3 is letting Reportupdater run once so it changes the file permissions.
Step 4 is clearing out dates in the WDQS report which will need to be recounted.
Step 5 is Reportupdater backfilling the missing dates using the patched query.
@Addshore hopefully step 5 will be done by end of the week! :)
Change 463361 had a related patch set uploaded (by Bearloga; owner: Bearloga):
[wikimedia/discovery/golden@master] Make generated reports be owned by analytics-search-users group
Change 463361 merged by Bearloga:
[wikimedia/discovery/golden@master] Make generated reports be owned by analytics-search-users group
Change 463517 had a related patch set uploaded (by Bearloga; owner: Bearloga):
[operations/puppet@production] Add chelsyx to analytics-search-users group
Alright, I wiped all the request counts starting with August 10th (after making a backup) so Golden/Reportupdater is going to start a re-count using the webrequests in the 'text' partition. WDQS stats re-count should be done by Monday. Thanks for your patience, folks!
Change 463517 merged by Ottomata:
[operations/puppet@production] Add chelsyx to analytics-search-users group