Page MenuHomePhabricator

Productionize loading of edit data into Druid (contingent on success of research spike)
Closed, ResolvedPublic5 Story Points

Event Timeline

Nuria created this task.Jul 27 2016, 7:16 PM
Nuria removed the point value for this task.Jul 28 2016, 5:37 PM

I'll try to do this using new hotness oozie generator: https://github.com/etsy/arbiter

Milimetric moved this task from Incoming to Backlog (Later) on the Analytics board.
Nuria assigned this task to JAllemandou.Dec 15 2016, 5:36 PM
Nuria edited projects, added Analytics-Kanban; removed Analytics.

ooziefying the druid loading

Nuria set the point value for this task to 5.Dec 15 2016, 5:37 PM

Change 328154 had a related patch set uploaded (by Joal):
[WIP] Add oozie job loading MW history in druid

https://gerrit.wikimedia.org/r/328154

Nuria added a comment.Jan 18 2017, 9:07 PM

Given the issues with volume of edit data in druid seems like this one should go back to "paused", correct?

I would like to down-scope it to load just 1 year of the data so we can show it next week at metrics. I take it upon myself to explain the limitation and lack of updates to people interested, but I think we have no other way to really communicate this work.

Nuria added a comment.Jan 18 2017, 9:21 PM

@Milimetric +1. Agreed. I think we talked about that today in standup. Sounds fine.

done using Druid loading rules.
Indexing involves full dataset, like that it is available on hadoop deepstorage if needed.
Druid however only loads 2 years of data, to make sure there is at least one full year available for analysis (~300GB).

Awesome, thanks @JAllemandou, can mark this done then.

Nuria moved this task from Next Up to In Progress on the Analytics-Kanban board.Mar 13 2017, 4:33 PM

Change 328154 merged by Joal:
[analytics/refinery] Add oozie job loading MW history in druid

https://gerrit.wikimedia.org/r/328154

Nuria closed this task as Resolved.Mar 22 2017, 7:47 PM