Page MenuHomePhabricator

Mediawiki History delayed 2022-05
Closed, ResolvedPublic

Description

This month we're having multiple problems with the mw history data pipeline. The sqoop jobs that pull in the source data failed while trying to use views on the Cloud replica. These views were in turn broken due to changes in production.

Once the sqoop problems were fixed late last week, the mw history denormalize job itself failed twice with the cryptic Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: A shuffle map stage with indeterminate output was failed and retried. However, Spark cannot rollback the ShuffleMapStage 986 to re-process the input data, and has to fail this job. Please eliminate the indeterminacy by checkpointing the RDD before repartition and try again.

This task will track problems and their solutions until we get this month's snapshot deployed and all dependent jobs cleared.

Event Timeline

Milimetric renamed this task from Mediawiki History delayed 2022-06 to Mediawiki History delayed 2022-05.Jun 6 2022, 7:43 PM

Change 803551 had a related patch set uploaded (by Milimetric; author: Milimetric):

[analytics/refinery@master] Increase resources for history job

https://gerrit.wikimedia.org/r/803551

Change 805446 had a related patch set uploaded (by Milimetric; author: Milimetric):

[analytics/refinery@master] Update mediawiki history pipeline

https://gerrit.wikimedia.org/r/805446

Change 805446 merged by Joal:

[analytics/refinery@master] Update mediawiki history pipeline

https://gerrit.wikimedia.org/r/805446

Change 803551 abandoned by Milimetric:

[analytics/refinery@master] Increase resources for history job

Reason:

i was sure we deployed this, there must be another change similar

https://gerrit.wikimedia.org/r/803551