Add unique devices dataset to pivot so reading PMs have easy access to data
Description
Details
Related Objects
Event Timeline
We should be able to have data per country since the beginning. Let's do daily and monthly unique devices.
- create ingestion spec for druid (json)
- hql for hive
- create 2 oozie jobs with coordinators
Change 347611 had a related patch set uploaded (by Joal):
[analytics/refinery@master] [WIP] Add oozie jobs loading uniques in druid
Note: Only daily uniques are imported into druid. Monthly don't work because of druid not allowing for monthly granularity queries (maximum is day).
Change 348052 had a related patch set uploaded (by Joal):
[analytics/refinery@master] Add oozie job loading monthly uniques in druid
@JAllemandou the monthly is the number we tend to use the most and, unlike pageviews, we can't simply roll them up as there is a duplication issue. I'll take what we can get for now, but should we file a separate ticket for monthly?
@JKatzWMF: We want to import monthly (code is ready: https://gerrit.wikimedia.org/r/#/c/348052/), but our version of druid and pivot can't handle the daily query granularity.
It's on our plate to upgrade soon, you can track progress here: https://phabricator.wikimedia.org/T157977
Change 347611 merged by Nuria:
[analytics/refinery@master] Add oozie job loading daily uniques in druid
Change 349449 had a related patch set uploaded (by Joal):
[operations/puppet@production] Add unique devices in pivot config
Change 349449 merged by Elukey:
[operations/puppet@production] Add unique devices in pivot config
Friendly remainder that before we can close this item the pivot splash screen needs to have a link to this dataset
Thanks, this is great! Unfortunately, in the current setup there is a lot of potential for confusion, because on the initial view it will show the sum over the uniques for all projects/language versions, without deduplication (a deduplicated metric is being worked on in T143928).
Is it possible to add some kind of annotation warning users that is not valid data, and that they need to filter down to individual projects to obtain meaningful numbers?
Thanks, this is great! Unfortunately, in the current setup there is a lot of potential for confusion, because on the initial view it will show the sum over the uniques for all projects/language versions
Ya, this is a usability fail. I wonder if we can disable that initial view for pivot forthis dataset. We can track that work here: https://phabricator.wikimedia.org/T164194
Change 348052 merged by Nuria:
[analytics/refinery@master] Add oozie jobs loading druid monthly uniques