Page MenuHomePhabricator

Query stats dashboard not updating
Closed, ResolvedPublic

Description

https://discovery.wmflabs.org/wdqs/#wdqs_usage is not updating since August 10th. Can we bring it back to life?

Details

Related Gerrit Patches:

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptSep 15 2018, 6:41 PM

@mpopov, @chelsyx Do you know anything about this?

Restricted Application added a project: Product-Analytics. · View Herald TranscriptSep 15 2018, 9:57 PM
Addshore moved this task from incoming to monitoring on the Wikidata board.
chelsyx added a comment.EditedSep 17 2018, 5:22 PM

Hi @Smalyshev , the dashboard is updating. But since August 10th, the SPARQL usage number is very small (even 0 for certain days) and the LDF usage number is 0. Did we change the URI of the endpoint?

Query:

SELECT
  year, month, day,
  uri_path,
  COUNT(*) AS events
FROM webrequest
WHERE
  webrequest_source = 'misc'
  AND year = 2018 AND month = 8 AND day > 9
  AND uri_host = 'query.wikidata.org'
  AND uri_path IN('/', '/bigdata/namespace/wdq/sparql', '/bigdata/ldf', '/sparql')
GROUP BY
  year, month, day,
  uri_path
ORDER BY year, month, day
LIMIT 10000

But since August 10th, the SPARQL usage number is very small (even 0 for certain days)

That can not be, the actual usage is definitely not 0. I see a lot of accesses on turnilo.wikimedia.org and in kibana. See for example: https://logstash.wikimedia.org/goto/74e376f55fcdc3b93e4a7232cfa5203a

Maybe something changed in request logging?

Did we change the URI of the endpoint?

No, the endpoint name is the same.

Smalyshev triaged this task as Medium priority.Sep 17 2018, 5:54 PM

Hi @Nuria we noticed that since August 10th, the SPARQL usage number is very small (see query in T204415#4590108), which is much less than what we saw in logstash: https://logstash.wikimedia.org/goto/74e376f55fcdc3b93e4a7232cfa5203a
Do you know any incident of webrequest that might cause this?

Milimetric assigned this task to Nuria.Sep 24 2018, 4:04 PM
Milimetric moved this task from Incoming to Radar on the Analytics board.
Nuria added a comment.Sep 24 2018, 5:03 PM

Misc is no longer in service, all requests have been migrated to 'text'

elukey added a subscriber: elukey.Sep 24 2018, 5:28 PM

Yep exactly, cache misc (where query.wikidata.org was hosted) has been migrated to cache text, therefore all the Hive queries (and related) should be using 'webrequest_text' from now on.

See also T200822: Remove webrequest misc analytics related jobs and code after cache misc -> text merge is complete and T164609: Merge cache_misc into cache_text functionally. Sorry yall didn't know about this. I wonder if there is a better way we can configure datasources like this more dynamically. Hm.

all the Hive queries (and related) should be using 'webrequest_text' from now on.

e.g. WHERE webrequest_source = 'text'

Nuria reassigned this task from Nuria to mpopov.Sep 24 2018, 5:36 PM

Assigned to @mpopov Again, our apologies that the data sources are hardcoded like this. As I mentioned on our meeting abetter path to go forward would be using the tags for wdqs to identify the requests: https://github.com/wikimedia/analytics-refinery-source/blob/master/refinery-core/src/main/java/org/wikimedia/analytics/refinery/core/webrequest/tag/WDQSTagger.java

mpopov added a subscriber: Gehel.Sep 24 2018, 5:43 PM

Thanks for looking into it, @Nuria! And for confirming, @elukey @Ottomata! :)

A note for Operations: this is not the first time we've encountered an issue like this. Last year our query for Maps usage stopped working because of partition changes that we weren't told of (T167083), and this is exactly like that. Nobody on Product-Analytics is subscribed to ops@lists.wikimedia (because 99.999% of those threads would be irrelevant to us), so I just want to point out that the decisions made by Ops that affect data sources like wmf.webrequest table need to be communicated to analysts who rely on those data sources.

I don't think it's reasonable to expect, say, @Gehel to notice those emails in his mailbox and notify us, so I suggest that when authoring emails announcing big, data source-related changes like partition drops and renames, please cc product-analytics@wikimedia.org since we have scripts and queries that operate on those data sources under certain hard-coded assumptions.

Assigned to @mpopov Again, our apologies that the data sources are hardcoded like this. As I mentioned on our meeting abetter path to go forward would be using the tags for wdqs to identify the requests: https://github.com/wikimedia/analytics-refinery-source/blob/master/refinery-core/src/main/java/org/wikimedia/analytics/refinery/core/webrequest/tag/WDQSTagger.java

BTW query has to filter by path anyway because it also counts WDQS homepage visits so we're not switching to tags in this case.

Change 462577 had a related patch set uploaded (by Bearloga; owner: Bearloga):
[wikimedia/discovery/golden@master] Change which partition WDQS requests are counted from

https://gerrit.wikimedia.org/r/462577

Change 462577 merged by Bearloga:
[wikimedia/discovery/golden@master] Change which partition WDQS requests are counted from

https://gerrit.wikimedia.org/r/462577

mpopov added a comment.EditedSep 24 2018, 9:31 PM

@Ottomata @Gehel: I tried editing stat1005:/srv/published-datasets/discovery/metrics/wdqs/basic_usage.tsv but couldn't because the file belongs to group analytics-search, not analytics-search-users (which I belong to) and that sort of makes sense because of how we have it configured right now in statistics::discovery:

$user = 'analytics-search'
$group ='analytics-privatedata-users'

...

cron { 'wikimedia-discovery-golden':
    ensure  => present,
    command => "cd ${dir}/golden && sh main.sh >> ${log_dir}/golden-daily.log 2>&1",
    hour    => '5',
    minute  => '0',
    require => [
        Class['::statistics::compute'],
        Git::Clone['wikimedia/discovery/golden'],
        Mysql::Config::Client['discovery-stats']
    ],
    user    => $user,
}

and main.sh in wikimedia/discovery/golden repo that generates these datasets:

# files created / touched by report updater need to be rw for user and group
umask 002

From Puppet 3.8 documentation for cron, it's not clear whether we can…somehow set a group? (Would that even make sense?)

I need to edit that file to erase all request counts affected by the 'misc' partition drop that we can recount from the 'text' partition.

@mpopov, since that file is managed by Puppet, you'll have to make a puppet patch to change it!

Oh sorry, misunderstood. Yes we should be able to make the output of the file writable by you somehow.

Change 462580 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Add analytics-search system user to analytics-search-users group

https://gerrit.wikimedia.org/r/462580

Change 462580 merged by Ottomata:
[operations/puppet@production] Add analytics-search system user to analytics-search-users group

https://gerrit.wikimedia.org/r/462580

Ok, I've added the analytics-search system user to the analytics-search-users group. You should make your script chgrp analytics-search-users <file> after it creates it.

WMDE wants to pull some data from this dashboard for some internal reporting (data needed for september).
Any ETA on the fix and backfill? :)

mpopov added a subscriber: mforns.Sep 25 2018, 3:19 PM

Ok, I've added the analytics-search system user to the analytics-search-users group. You should make your script chgrp analytics-search-users <file> after it creates it.

Thank you very much, Andrew! That's gonna need to be done with T205441, which I've started on. That's step 1, which I'll need @mforns's help with CR and enabling the parameter to be specified in the defaults section of the YAML config.

Step 2 is Chelsy/me updating the configs to specify the analytics-search-users group and updating the Reportupdater submodule in golden to the patched version.

Step 3 is letting Reportupdater run once so it changes the file permissions.

Step 4 is clearing out dates in the WDQS report which will need to be recounted.

Step 5 is Reportupdater backfilling the missing dates using the patched query.

@Addshore hopefully step 5 will be done by end of the week! :)

Change 463361 had a related patch set uploaded (by Bearloga; owner: Bearloga):
[wikimedia/discovery/golden@master] Make generated reports be owned by analytics-search-users group

https://gerrit.wikimedia.org/r/463361

Change 463361 merged by Bearloga:
[wikimedia/discovery/golden@master] Make generated reports be owned by analytics-search-users group

https://gerrit.wikimedia.org/r/463361

Change 463517 had a related patch set uploaded (by Bearloga; owner: Bearloga):
[operations/puppet@production] Add chelsyx to analytics-search-users group

https://gerrit.wikimedia.org/r/463517

Alright, I wiped all the request counts starting with August 10th (after making a backup) so Golden/Reportupdater is going to start a re-count using the webrequests in the 'text' partition. WDQS stats re-count should be done by Monday. Thanks for your patience, folks!

Change 463517 merged by Ottomata:
[operations/puppet@production] Add chelsyx to analytics-search-users group

https://gerrit.wikimedia.org/r/463517

mpopov closed this task as Resolved.Oct 3 2018, 12:27 AM

All good now :)