Page MenuHomePhabricator

JsonRevisionsSortedPerPage failed on enwiki-20150901-pages-meta-history [13 pts] {paon}
Closed, ResolvedPublic


job_1442877556644_0009 on the Wikimedia altiscale cluster

Event Timeline

Halfak raised the priority of this task from to Needs Triage.
Halfak updated the task description. (Show Details)
Halfak added a project: Analytics-Backlog.
Halfak moved this task to Incoming on the Analytics-Backlog board.
Halfak added a subscriber: Halfak.
Halfak set Security to None.

Looks like we were running out of memory in the reducer. The job took nearly 48 hours to arrive at this failed state.

Job Name: 	org.wikimedia.wikihadoop.job.JsonRevisionsSortedPerPage$: MediaWikiRevisionXMLToJSONInputFormat(/user/halfak/stream... ID=1 (1/1)
User Name: 	halfak
Queue: 	default
State: 	FAILED
Uberized: 	false
Submitted: 	Tue Sep 29 23:08:31 UTC 2015
Started: 	Tue Sep 29 23:08:39 UTC 2015
Finished: 	Thu Oct 01 11:43:25 UTC 2015
Elapsed: 	36hrs, 34mins, 45sec
Task failed task_1442877556644_0009_r_000274
Job failed as tasks failed. failedMaps:0 failedReduces:1
Average Map Time 	29mins, 6sec
Average Reduce Time 	41mins, 50sec
Average Shuffle Time 	17mins, 15sec
Average Merge Time 	4sec

Here's the command I ran:

hadoop jar ~/jars/wikihadoop-0.2.jar \
  org.wikimedia.wikihadoop.job.JsonRevisionsSortedPerPage \
  -i /user/halfak/streaming/enwiki-20150901/xml-bz2 \
  -o /user/halfak/streaming/enwiki-20150901/revdocs-bz2 \
  -r 2000

Looked at the logs: Seemed to be an interuption exception.
If so, there are chances that the issue comes from timeout.
There is a parameter that can be changed in the job (with a typo ...) that defaults to 1800000 (1/2h) --> can be changed to 3600000 (1h).
Also, the number of reducers could be set up a bit (2000 is not that big).
I'd like to see if the following run works:

hadoop jar ~/jars/wikihadoop-0.2.jar \
  org.wikimedia.wikihadoop.job.JsonRevisionsSortedPerPage \
  -i /user/halfak/streaming/enwiki-20150901/xml-bz2 \
  -o /user/halfak/streaming/enwiki-20150901/revdocs-bz2 \
  -r 5000
  --task-tiemout 3600000

Let's talk about that today.

JAllemandou renamed this task from JsonRevisionsSortedPerPage failed on enwiki-20150901-pages-meta-history to JsonRevisionsSortedPerPage failed on enwiki-20150901-pages-meta-history [13 pts] {paon}.Nov 19 2015, 4:50 PM
JAllemandou moved this task from Paused to In Progress on the Analytics-Kanban board.

I tested various memory, each failed.

I finally went and rewrote the job using core mapreduce API instead of using scrunch.

Job is still running but no error so far.