In mediawiki-history we use a skewed-join trick to join revision data with comments and actors. This trick involves partitioning skewed-data using a random number, and this can lead to errors in case some task(s) on the join step fail, as the assignation of partitioned-data to workers is non-deterministic (recomputed partitioned-rows might endup in different workers as the ones they were originaly assinged to).
Suggested solution is to use revision_id % number_of_splits as a deterministic number for the revision instead of a random number.