Page MenuHomePhabricator

Update skewed-join strategy in Mediawiki-history to prevent errors in case of task-retry
Closed, ResolvedPublic

Description

In mediawiki-history we use a skewed-join trick to join revision data with comments and actors. This trick involves partitioning skewed-data using a random number, and this can lead to errors in case some task(s) on the join step fail, as the assignation of partitioned-data to workers is non-deterministic (recomputed partitioned-rows might endup in different workers as the ones they were originaly assinged to).

Suggested solution is to use revision_id % number_of_splits as a deterministic number for the revision instead of a random number.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJun 16 2020, 9:51 AM
JAllemandou updated the task description. (Show Details)Jun 16 2020, 10:16 AM

+1 to the rev_id instead of the random number

fdans triaged this task as High priority.Jun 18 2020, 3:58 PM
fdans moved this task from Incoming to Operational Excellence on the Analytics board.

Change 608567 had a related patch set uploaded (by Joal; owner: Joal):
[analytics/refinery/source@master] Make mediawiki_history skewed join deterministic

https://gerrit.wikimedia.org/r/c/analytics/refinery/source/ /608567

JAllemandou moved this task from Next Up to In Code Review on the Analytics-Kanban board.
JAllemandou set Final Story Points to 1.Jun 30 2020, 9:51 AM

Change 608567 merged by Ottomata:
[analytics/refinery/source@master] Make mediawiki_history skewed join deterministic

https://gerrit.wikimedia.org/r/c/analytics/refinery/source/ /608567

Change 608665 had a related patch set uploaded (by Joal; owner: Joal):
[analytics/refinery@master] Bump mediawiki-history-denormalize jar version

https://gerrit.wikimedia.org/r/c/analytics/refinery/ /608665

Change 608665 merged by Ottomata:
[analytics/refinery@master] Bump mediawiki-history-denormalize jar version

https://gerrit.wikimedia.org/r/c/analytics/refinery/ /608665

Change 609465 had a related patch set uploaded (by Joal; owner: Joal):
[analytics/refinery/source@master] Fix mediawiki-history skewed join bug

https://gerrit.wikimedia.org/r/609465

Change 609465 merged by jenkins-bot:
[analytics/refinery/source@master] Fix mediawiki-history skewed join bug

https://gerrit.wikimedia.org/r/609465

Change 613655 had a related patch set uploaded (by Joal; owner: Joal):
[analytics/refinery@master] Update mediawiki-history-denormalize job jar version

https://gerrit.wikimedia.org/r/613655

Change 613655 merged by Mforns:
[analytics/refinery@master] Update mediawiki-history-denormalize job jar version

https://gerrit.wikimedia.org/r/613655

Nuria closed this task as Resolved.Jul 23 2020, 4:39 AM
Nuria changed Final Story Points from 1 to 3.