Page MenuHomePhabricator

[Spike] Spark job for digests-only mediawiki-history-reduced
Open, MediumPublic

Description

This task is about moving the mediawiki_history_reduced dataset from using raw-events and digests-events to digests-events only.
Optimizations are needed for the dataset not to grow exponentially with dimension-denormalization, and for the computation not to generate the overly-big intermediate dataset.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJan 4 2019, 9:56 AM
Milimetric triaged this task as High priority.Jan 7 2019, 4:35 PM
Milimetric moved this task from Incoming to Smart Tools for Better Data on the Analytics board.
mforns lowered the priority of this task from High to Medium.Dec 9 2019, 5:23 PM

Change 601662 had a related patch set uploaded (by Joal; owner: Joal):
[analytics/refinery/source@master] [SPIKE] Add mediawiki-reduced spark job

https://gerrit.wikimedia.org/r/601662

JAllemandou set Final Story Points to 5.
JAllemandou moved this task from Paused to Done on the Analytics-Kanban board.
Nuria moved this task from Done to Paused on the Analytics-Kanban board.Jun 3 2020, 3:56 PM