Page MenuHomePhabricator

JsonRevisionsSortedPerPage failed on enwiki-20150901-pages-meta-history [13 pts] {paon}
Closed, ResolvedPublic

Description

job_1442877556644_0009 on the Wikimedia altiscale cluster

Event Timeline

Halfak raised the priority of this task from to Needs Triage.
Halfak updated the task description. (Show Details)
Halfak added a project: Analytics-Backlog.
Halfak moved this task to Incoming on the Analytics-Backlog board.
Halfak subscribed.
Halfak set Security to None.

Looks like we were running out of memory in the reducer. The job took nearly 48 hours to arrive at this failed state.

Job Name: 	org.wikimedia.wikihadoop.job.JsonRevisionsSortedPerPage$: MediaWikiRevisionXMLToJSONInputFormat(/user/halfak/stream... ID=1 (1/1)
User Name: 	halfak
Queue: 	default
State: 	FAILED
Uberized: 	false
Submitted: 	Tue Sep 29 23:08:31 UTC 2015
Started: 	Tue Sep 29 23:08:39 UTC 2015
Finished: 	Thu Oct 01 11:43:25 UTC 2015
Elapsed: 	36hrs, 34mins, 45sec
Diagnostics: 	
Task failed task_1442877556644_0009_r_000274
Job failed as tasks failed. failedMaps:0 failedReduces:1
Average Map Time 	29mins, 6sec
Average Reduce Time 	41mins, 50sec
Average Shuffle Time 	17mins, 15sec
Average Merge Time 	4sec

Here's the command I ran:

hadoop jar ~/jars/wikihadoop-0.2.jar \
  org.wikimedia.wikihadoop.job.JsonRevisionsSortedPerPage \
  -i /user/halfak/streaming/enwiki-20150901/xml-bz2 \
  -o /user/halfak/streaming/enwiki-20150901/revdocs-bz2 \
  -r 2000

Looked at the logs: Seemed to be an interuption exception.
If so, there are chances that the issue comes from timeout.
There is a parameter that can be changed in the job (with a typo ...) that defaults to 1800000 (1/2h) --> can be changed to 3600000 (1h).
Also, the number of reducers could be set up a bit (2000 is not that big).
I'd like to see if the following run works:

hadoop jar ~/jars/wikihadoop-0.2.jar \
  org.wikimedia.wikihadoop.job.JsonRevisionsSortedPerPage \
  -i /user/halfak/streaming/enwiki-20150901/xml-bz2 \
  -o /user/halfak/streaming/enwiki-20150901/revdocs-bz2 \
  -r 5000
  --task-tiemout 3600000

Let's talk about that today.

JAllemandou renamed this task from JsonRevisionsSortedPerPage failed on enwiki-20150901-pages-meta-history to JsonRevisionsSortedPerPage failed on enwiki-20150901-pages-meta-history [13 pts] {paon}.Nov 19 2015, 4:50 PM
JAllemandou moved this task from Paused to In Progress on the Analytics-Kanban board.

I tested various memory, each failed.

I finally went and rewrote the job using core mapreduce API instead of using scrunch.

Job is still running but no error so far.