This month we're having multiple problems with the mw history data pipeline. The sqoop jobs that pull in the source data failed while trying to use views on the Cloud replica. These views were in turn broken due to changes in production.
Once the sqoop problems were fixed late last week, the mw history denormalize job itself failed twice with the cryptic Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: A shuffle map stage with indeterminate output was failed and retried. However, Spark cannot rollback the ShuffleMapStage 986 to re-process the input data, and has to fail this job. Please eliminate the indeterminacy by checkpointing the RDD before repartition and try again.
This task will track problems and their solutions until we get this month's snapshot deployed and all dependent jobs cleared.