|analytics/refinery : master||Add oozie job loading MW history in druid|
|Resolved||None||T120037 Vital Signs: Please provide an "all languages" de-duplicated stream for the Community/Content groups of metrics|
|Resolved||None||T120036 Vital Signs: Please make the data for enwiki and other big wikis less sad, and not just be missing for most days|
|Open||None||T130256 Wikistats 2.0.|
|Resolved||None||T131779 Implement Pages Created & Count of Edits full vertical slice|
|Open||None||T131782 Put data needed for edits metrics through Event Bus into HDFS|
|Resolved||None||T131786 Load edit history data into Druid|
|Invalid||mforns||T141479 Reportupdater calculations for Pages Created and Edit counts|
|Resolved||None||T143924 Replacing standard edit metrics in dashiki with data from new edit data depot|
|Resolved||JAllemandou||T152035 Productionize Edit History Reconstruction and Extraction|
|Resolved||JAllemandou||T141473 Productionize loading of edit data into Druid (contingent on success of research spike)|
I would like to down-scope it to load just 1 year of the data so we can show it next week at metrics. I take it upon myself to explain the limitation and lack of updates to people interested, but I think we have no other way to really communicate this work.
done using Druid loading rules.
Indexing involves full dataset, like that it is available on hadoop deepstorage if needed.
Druid however only loads 2 years of data, to make sure there is at least one full year available for analysis (~300GB).