Page MenuHomePhabricator

mediawiki-wikitext-history-2020-10 failed
Closed, ResolvedPublic

Description

Fatal Error - Oozie Job mediawiki-wikitext-history-wf-2020-10

It seems the convert_xml_to_parquet action of this job failed, possibly due to OutOfMemory errors.

https://hue.wikimedia.org/oozie/list_oozie_workflow/0001280-201127102807975-oozie-oozi-W/

(Coordinators: https://hue.wikimedia.org/oozie/list_oozie_coordinator/0001955-201103154415936-oozie-oozi-C/)

The Yarn application id for this action is application_1605880843685_28649, and the logs have several instances of java.lang.OutOfMemoryError: Java heap space.

E.g.

2020-11-28 23:47:28,918 [Executor task launch worker for task 15614] INFO  org.apache.spark.util.collection.ExternalSorter  - Thread 97 spilling in-memory map of 4.2 GB to disk (2 times so far)
2020-11-28 23:48:43,946 [Executor task launch worker for task 15665] INFO  org.apache.spark.util.collection.ExternalSorter  - Thread 98 spilling in-memory map of 4.2 GB to disk (1 time so far)
#
# java.lang.OutOfMemoryError: Java heap space
# -XX:OnOutOfMemoryError="kill %p"
#   Executing /bin/sh -c "kill 19413"...
2020-11-28 23:54:47,807 [SIGTERM handler] ERROR org.apache.spark.executor.CoarseGrainedExecutorBackend  - RECEIVED SIGNAL TERM
2020-11-28 23:54:48,928 [Executor task launch worker for task 15498] ERROR org.apache.spark.executor.Executor  - Exception in task 1420.3 in stage 396.0 (TID 15498)
java.lang.OutOfMemoryError: Java heap space
	at java.util.Arrays.copyOfRange(Arrays.java:3664)
	at java.lang.StringBuffer.toString(StringBuffer.java:669)
	at com.ctc.wstx.util.TextBuffer.contentsAsString(TextBuffer.java:580)
	at com.ctc.wstx.sr.BasicStreamReader.getElementText(BasicStreamReader.java:695)
	at org.wikimedia.wikihadoop.xmlparser.MediawikiXMLParser.parseRevision_rec(MediawikiXMLParser.scala:147)
	at org.wikimedia.wikihadoop.xmlparser.MediawikiXMLParser.parseRevision(MediawikiXMLParser.scala:126)
	at org.wikimedia.wikihadoop.newapi.MediawikiXMLRevisionInputFormat$MediawikiXMLRevisionRecordReader.readNextRevision(MediawikiXMLRevisionInputFormat.scala:259)
	at org.wikimedia.wikihadoop.newapi.MediawikiXMLRevisionInputFormat$MediawikiXMLRevisionRecordReader.nextKeyValue(MediawikiXMLRevisionInputFormat.scala:208)
	at org.apache.spark.rdd.NewHadoopRDD$$anon$1.hasNext(NewHadoopRDD.scala:230)
	at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
	at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
	at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
	at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:191)
	at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:62)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)
	at org.apache.spark.scheduler.Task.run(Task.scala:123)
	at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

Event Timeline

hm - This is not cool. Let's see if by reducing the number of cores to 3 for the same amount of RAM the job works. I'll also grow a bit the number of executors as the cluster is more powerful than it was.

Change 644510 had a related patch set uploaded (by Joal; owner: Joal):
[analytics/refinery@master] Grow mediawiki-wikitext-history spark job

https://gerrit.wikimedia.org/r/644510

Milimetric triaged this task as High priority.
Milimetric moved this task from Incoming to Datasets on the Analytics board.
Milimetric added a project: Analytics-Kanban.

Change 657053 had a related patch set uploaded (by Joal; owner: Joal):
[analytics/refinery/source@master] [WIP] Fix wikitext history job

https://gerrit.wikimedia.org/r/657053

Change 667932 had a related patch set uploaded (by Joal; owner: Joal):
[analytics/refinery@master] Bump jar version of wikitext oozie jobs

https://gerrit.wikimedia.org/r/667932

Change 657053 merged by jenkins-bot:
[analytics/refinery/source@master] Fix wikitext history job

https://gerrit.wikimedia.org/r/657053

Change 667932 merged by Mforns:
[analytics/refinery@master] Bump jar version of wikitext oozie jobs

https://gerrit.wikimedia.org/r/667932