Page MenuHomePhabricator

Improve webrequest-refine shuffle-sort
Closed, ResolvedPublic

Description

Webrequest-refine currently shuffles both raw and augmented data to enforce rows being distinct. As augmented values are computed deterministically the distinct part can be enforced using only raw data, therefore preventing having augmented-data being shuffled between mappers and reducers (network + disk IOs reduction).

Event Timeline

JAllemandou set Final Story Points to 3.

Change 638086 had a related patch set uploaded (by Joal; owner: Joal):
[analytics/refinery@master] Improve webrequest-refine query shuffle stage

https://gerrit.wikimedia.org/r/638086

Change 638086 merged by Joal:
[analytics/refinery@master] Improve webrequest-refine query

https://gerrit.wikimedia.org/r/638086