Page MenuHomePhabricator

Add daily unique devices dataset to pivot
Closed, ResolvedPublic8 Estimated Story Points

Description

Add unique devices dataset to pivot so reading PMs have easy access to data

Event Timeline

Nuria triaged this task as Medium priority.Mar 13 2017, 5:08 PM

We should be able to have data per country since the beginning. Let's do daily and monthly unique devices.

  • create ingestion spec for druid (json)
  • hql for hive
  • create 2 oozie jobs with coordinators
Nuria set the point value for this task to 8.

Change 347611 had a related patch set uploaded (by Joal):
[analytics/refinery@master] [WIP] Add oozie jobs loading uniques in druid

https://gerrit.wikimedia.org/r/347611

Note: Only daily uniques are imported into druid. Monthly don't work because of druid not allowing for monthly granularity queries (maximum is day).

Change 348052 had a related patch set uploaded (by Joal):
[analytics/refinery@master] Add oozie job loading monthly uniques in druid

https://gerrit.wikimedia.org/r/348052

Note: Only daily uniques are imported into druid. Monthly don't work because of druid not allowing for monthly granularity queries (maximum is day).

@JAllemandou the monthly is the number we tend to use the most and, unlike pageviews, we can't simply roll them up as there is a duplication issue. I'll take what we can get for now, but should we file a separate ticket for monthly?

@JKatzWMF: We want to import monthly (code is ready: https://gerrit.wikimedia.org/r/#/c/348052/), but our version of druid and pivot can't handle the daily query granularity.
It's on our plate to upgrade soon, you can track progress here: https://phabricator.wikimedia.org/T157977

Change 347611 merged by Nuria:
[analytics/refinery@master] Add oozie job loading daily uniques in druid

https://gerrit.wikimedia.org/r/347611

JAllemandou renamed this task from Add unique devices dataset to pivot to Add daily unique devices dataset to pivot .Apr 19 2017, 2:03 PM

Change 349449 had a related patch set uploaded (by Joal):
[operations/puppet@production] Add unique devices in pivot config

https://gerrit.wikimedia.org/r/349449

Change 349449 merged by Elukey:
[operations/puppet@production] Add unique devices in pivot config

https://gerrit.wikimedia.org/r/349449

Friendly remainder that before we can close this item the pivot splash screen needs to have a link to this dataset

Thanks, this is great! Unfortunately, in the current setup there is a lot of potential for confusion, because on the initial view it will show the sum over the uniques for all projects/language versions, without deduplication (a deduplicated metric is being worked on in T143928).
Is it possible to add some kind of annotation warning users that is not valid data, and that they need to filter down to individual projects to obtain meaningful numbers?

Thanks, this is great! Unfortunately, in the current setup there is a lot of potential for confusion, because on the initial view it will show the sum over the uniques for all projects/language versions

Ya, this is a usability fail. I wonder if we can disable that initial view for pivot forthis dataset. We can track that work here: https://phabricator.wikimedia.org/T164194

Change 348052 merged by Nuria:
[analytics/refinery@master] Add oozie jobs loading druid monthly uniques

https://gerrit.wikimedia.org/r/348052