Fatal Error - Oozie Job mediawiki-wikitext-history-wf-2020-10
It seems the convert_xml_to_parquet action of this job failed, possibly due to OutOfMemory errors.
https://hue.wikimedia.org/oozie/list_oozie_workflow/0001280-201127102807975-oozie-oozi-W/
(Coordinators: https://hue.wikimedia.org/oozie/list_oozie_coordinator/0001955-201103154415936-oozie-oozi-C/)
The Yarn application id for this action is application_1605880843685_28649, and the logs have several instances of java.lang.OutOfMemoryError: Java heap space.
E.g.
2020-11-28 23:47:28,918 [Executor task launch worker for task 15614] INFO org.apache.spark.util.collection.ExternalSorter - Thread 97 spilling in-memory map of 4.2 GB to disk (2 times so far) 2020-11-28 23:48:43,946 [Executor task launch worker for task 15665] INFO org.apache.spark.util.collection.ExternalSorter - Thread 98 spilling in-memory map of 4.2 GB to disk (1 time so far) # # java.lang.OutOfMemoryError: Java heap space # -XX:OnOutOfMemoryError="kill %p" # Executing /bin/sh -c "kill 19413"... 2020-11-28 23:54:47,807 [SIGTERM handler] ERROR org.apache.spark.executor.CoarseGrainedExecutorBackend - RECEIVED SIGNAL TERM 2020-11-28 23:54:48,928 [Executor task launch worker for task 15498] ERROR org.apache.spark.executor.Executor - Exception in task 1420.3 in stage 396.0 (TID 15498) java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOfRange(Arrays.java:3664) at java.lang.StringBuffer.toString(StringBuffer.java:669) at com.ctc.wstx.util.TextBuffer.contentsAsString(TextBuffer.java:580) at com.ctc.wstx.sr.BasicStreamReader.getElementText(BasicStreamReader.java:695) at org.wikimedia.wikihadoop.xmlparser.MediawikiXMLParser.parseRevision_rec(MediawikiXMLParser.scala:147) at org.wikimedia.wikihadoop.xmlparser.MediawikiXMLParser.parseRevision(MediawikiXMLParser.scala:126) at org.wikimedia.wikihadoop.newapi.MediawikiXMLRevisionInputFormat$MediawikiXMLRevisionRecordReader.readNextRevision(MediawikiXMLRevisionInputFormat.scala:259) at org.wikimedia.wikihadoop.newapi.MediawikiXMLRevisionInputFormat$MediawikiXMLRevisionRecordReader.nextKeyValue(MediawikiXMLRevisionInputFormat.scala:208) at org.apache.spark.rdd.NewHadoopRDD$$anon$1.hasNext(NewHadoopRDD.scala:230) at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409) at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:191) at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:62) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55) at org.apache.spark.scheduler.Task.run(Task.scala:123) at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)