Page MenuHomePhabricator

[Spike] Spark job for digests-only mediawiki-history-reduced
Closed, ResolvedPublic

Description

This task is about moving the mediawiki_history_reduced dataset from using raw-events and digests-events to digests-events only.
Optimizations are needed for the dataset not to grow exponentially with dimension-denormalization, and for the computation not to generate the overly-big intermediate dataset.

Event Timeline

Milimetric moved this task from Incoming to Smart Tools for Better Data on the Analytics board.
mforns lowered the priority of this task from High to Medium.Dec 9 2019, 5:23 PM

Change 601662 had a related patch set uploaded (by Joal; owner: Joal):
[analytics/refinery/source@master] [SPIKE] Add mediawiki-reduced spark job

https://gerrit.wikimedia.org/r/601662

JAllemandou set Final Story Points to 5.
JAllemandou moved this task from Paused to Done on the Analytics-Kanban board.