Page MenuHomePhabricator

Productionize loading of edit data into Druid (contingent on success of research spike)
Closed, ResolvedPublic5 Estimated Story Points

Event Timeline

Nuria removed the point value for this task.Jul 28 2016, 5:37 PM

I'll try to do this using new hotness oozie generator:

Nuria edited projects, added Analytics-Kanban; removed Analytics.

ooziefying the druid loading

Nuria set the point value for this task to 5.Dec 15 2016, 5:37 PM

Change 328154 had a related patch set uploaded (by Joal):
[WIP] Add oozie job loading MW history in druid

Given the issues with volume of edit data in druid seems like this one should go back to "paused", correct?

I would like to down-scope it to load just 1 year of the data so we can show it next week at metrics. I take it upon myself to explain the limitation and lack of updates to people interested, but I think we have no other way to really communicate this work.

@Milimetric +1. Agreed. I think we talked about that today in standup. Sounds fine.

done using Druid loading rules.
Indexing involves full dataset, like that it is available on hadoop deepstorage if needed.
Druid however only loads 2 years of data, to make sure there is at least one full year available for analysis (~300GB).

Awesome, thanks @JAllemandou, can mark this done then.

Change 328154 merged by Joal:
[analytics/refinery] Add oozie job loading MW history in druid