Add daily unique devices dataset to pivot
Closed, ResolvedPublic8 Story Points

Description

Add unique devices dataset to pivot so reading PMs have easy access to data

Nuria created this task.Mar 2 2017, 6:22 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMar 2 2017, 6:22 PM
Nuria added a subscriber: JKatzWMF.Mar 2 2017, 6:53 PM
Nuria moved this task from Incoming to Q3 (january 2018) on the Analytics board.Mar 6 2017, 4:31 PM
Nuria triaged this task as Normal priority.Mar 13 2017, 5:08 PM
Nuria moved this task from Q3 (january 2018) to To Task on the Analytics board.Apr 5 2017, 2:51 PM
Nuria added a comment.Apr 6 2017, 4:04 PM

We should be able to have data per country since the beginning. Let's do daily and monthly unique devices.

  • create ingestion spec for druid (json)
  • hql for hive
  • create 2 oozie jobs with coordinators
Nuria edited projects, added Analytics-Kanban; removed Analytics.Apr 6 2017, 4:04 PM
Nuria set the point value for this task to 8.
JAllemandou moved this task from Next Up to In Progress on the Analytics-Kanban board.

Change 347611 had a related patch set uploaded (by Joal):
[analytics/refinery@master] [WIP] Add oozie jobs loading uniques in druid

https://gerrit.wikimedia.org/r/347611

Note: Only daily uniques are imported into druid. Monthly don't work because of druid not allowing for monthly granularity queries (maximum is day).

Change 348052 had a related patch set uploaded (by Joal):
[analytics/refinery@master] Add oozie job loading monthly uniques in druid

https://gerrit.wikimedia.org/r/348052

JKatzWMF added a comment.EditedApr 18 2017, 12:06 AM

Note: Only daily uniques are imported into druid. Monthly don't work because of druid not allowing for monthly granularity queries (maximum is day).

@JAllemandou the monthly is the number we tend to use the most and, unlike pageviews, we can't simply roll them up as there is a duplication issue. I'll take what we can get for now, but should we file a separate ticket for monthly?

@JKatzWMF: We want to import monthly (code is ready: https://gerrit.wikimedia.org/r/#/c/348052/), but our version of druid and pivot can't handle the daily query granularity.
It's on our plate to upgrade soon, you can track progress here: https://phabricator.wikimedia.org/T157977

Change 347611 merged by Nuria:
[analytics/refinery@master] Add oozie job loading daily uniques in druid

https://gerrit.wikimedia.org/r/347611

JAllemandou renamed this task from Add unique devices dataset to pivot to Add daily unique devices dataset to pivot .Apr 19 2017, 2:03 PM
Nuria moved this task from Ready to Deploy to Done on the Analytics-Kanban board.Apr 19 2017, 7:11 PM
Nuria moved this task from Done to Ready to Deploy on the Analytics-Kanban board.

Change 349449 had a related patch set uploaded (by Joal):
[operations/puppet@production] Add unique devices in pivot config

https://gerrit.wikimedia.org/r/349449

Change 349449 merged by Elukey:
[operations/puppet@production] Add unique devices in pivot config

https://gerrit.wikimedia.org/r/349449

Nuria added a comment.Apr 21 2017, 5:45 PM

Friendly remainder that before we can close this item the pivot splash screen needs to have a link to this dataset

Nuria closed this task as Resolved.Apr 21 2017, 6:42 PM
Tbayer added a subscriber: Tbayer.Apr 29 2017, 7:06 PM

Thanks, this is great! Unfortunately, in the current setup there is a lot of potential for confusion, because on the initial view it will show the sum over the uniques for all projects/language versions, without deduplication (a deduplicated metric is being worked on in T143928).
Is it possible to add some kind of annotation warning users that is not valid data, and that they need to filter down to individual projects to obtain meaningful numbers?

Nuria added a comment.May 1 2017, 2:31 PM

Thanks, this is great! Unfortunately, in the current setup there is a lot of potential for confusion, because on the initial view it will show the sum over the uniques for all projects/language versions

Ya, this is a usability fail. I wonder if we can disable that initial view for pivot forthis dataset. We can track that work here: https://phabricator.wikimedia.org/T164194

Change 348052 merged by Nuria:
[analytics/refinery@master] Add oozie jobs loading druid monthly uniques

https://gerrit.wikimedia.org/r/348052