Page MenuHomePhabricator

Update skewed-join strategy in Mediawiki-history to prevent errors in case of task-retry
Closed, ResolvedPublic

Description

In mediawiki-history we use a skewed-join trick to join revision data with comments and actors. This trick involves partitioning skewed-data using a random number, and this can lead to errors in case some task(s) on the join step fail, as the assignation of partitioned-data to workers is non-deterministic (recomputed partitioned-rows might endup in different workers as the ones they were originaly assinged to).

Suggested solution is to use revision_id % number_of_splits as a deterministic number for the revision instead of a random number.

Event Timeline

+1 to the rev_id instead of the random number

fdans moved this task from Incoming to Operational Excellence on the Analytics board.

Change 608567 had a related patch set uploaded (by Joal; owner: Joal):
[analytics/refinery/source@master] Make mediawiki_history skewed join deterministic

https://gerrit.wikimedia.org/r/c/analytics/refinery/source/ /608567

Change 608567 merged by Ottomata:
[analytics/refinery/source@master] Make mediawiki_history skewed join deterministic

https://gerrit.wikimedia.org/r/c/analytics/refinery/source/ /608567

Change 608665 had a related patch set uploaded (by Joal; owner: Joal):
[analytics/refinery@master] Bump mediawiki-history-denormalize jar version

https://gerrit.wikimedia.org/r/c/analytics/refinery/ /608665

Change 608665 merged by Ottomata:
[analytics/refinery@master] Bump mediawiki-history-denormalize jar version

https://gerrit.wikimedia.org/r/c/analytics/refinery/ /608665

Change 609465 had a related patch set uploaded (by Joal; owner: Joal):
[analytics/refinery/source@master] Fix mediawiki-history skewed join bug

https://gerrit.wikimedia.org/r/609465

Change 609465 merged by jenkins-bot:
[analytics/refinery/source@master] Fix mediawiki-history skewed join bug

https://gerrit.wikimedia.org/r/609465

Change 613655 had a related patch set uploaded (by Joal; owner: Joal):
[analytics/refinery@master] Update mediawiki-history-denormalize job jar version

https://gerrit.wikimedia.org/r/613655

Change 613655 merged by Mforns:
[analytics/refinery@master] Update mediawiki-history-denormalize job jar version

https://gerrit.wikimedia.org/r/613655

Nuria changed Final Story Points from 1 to 3.