This task is about moving the mediawiki_history_reduced dataset from using raw-events and digests-events to digests-events only.
Optimizations are needed for the dataset not to grow exponentially with dimension-denormalization, and for the computation not to generate the overly-big intermediate dataset.
Description
Description
Details
Details
Project | Branch | Lines +/- | Subject | |
---|---|---|---|---|
analytics/refinery/source | master | +686 -0 | [SPIKE] Add mediawiki-reduced spark job |
Event Timeline
Comment Actions
Change 601662 had a related patch set uploaded (by Joal; owner: Joal):
[analytics/refinery/source@master] [SPIKE] Add mediawiki-reduced spark job