Page MenuHomePhabricator

[Spike] Spark job for digests-only mediawiki-history-reduced
Open, HighPublic

Description

This task is about moving the mediawiki_history_reduced dataset from using raw-events and digests-events to digests-events only.
Optimizations are needed for the dataset not to grow exponentially with dimension-denormalization, and for the computation not to generate the overly-big intermediate dataset.