Maniphest T172018

Cannot request more than 4 cores per spark executor
Closed, ResolvedPublic1 Estimated Story Points
Actions

Assigned To

Authored By

	EBernhardson
	Jul 28 2017, 10:36 PM

Description

Trying to spin up a spark job with --executor-cores greater than 4 is able to start, but it is never assigned any executors from yarn. This looks to be limited by the configuration key yarn.scheduler.maximum-allocation-cores:

ebernhardson@stat1005:~$ hdfs getconf -confKey yarn.scheduler.maximum-allocation-vcores
Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF-8
4

I'd like to experiment with different values to figure out what the most efficient use of resources is when training ML models. It may be that fewer executors with more cores per executor is more efficient (or it might not) in terms of total cpu time used. To find out i would need to be able to test,

Details

	Subject	Repo	Branch	Lines +/-
	Set maximum yarn vcore allocation to 32	operations/puppet	production	+1 -0

Customize query in gerrit

Event Timeline

EBernhardson created this task.Jul 28 2017, 10:36 PM

Restricted Application added a project: Analytics. · View Herald TranscriptJul 28 2017, 10:36 PM

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

mforns edited projects, added Analytics-Kanban; removed Analytics.Jul 31 2017, 3:29 PM

Ottomata claimed this task.Jul 31 2017, 3:33 PM

Ottomata set the point value for this task to 1.

Change 368806 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Set maximum yarn vcore allocation to 32

https://gerrit.wikimedia.org/r/368806

gerritbot added a project: Patch-For-Review.Jul 31 2017, 3:33 PM

Change 368806 merged by Ottomata:
[operations/puppet@production] Set maximum yarn vcore allocation to 32

https://gerrit.wikimedia.org/r/368806

Hm the default should be 32, not sure why you are seeing 4. Anyway, just merged ^. We'll have to wait for some a cluster restart (or at least ResourceManager?) for this to take affect. How urgent is this?

Not super urgent, everything certainly works now with 4 cores i was just doing some measurements to see if there was a sweet spot in vcores seconds with varied parallelism. Turns out i can use 10k or 30k vcore seconds to basically do the same thing with different parallelism configs.

Ottomata moved this task from Next Up to Ready to Deploy on the Analytics-Kanban board.Aug 1 2017, 1:16 PM

@EBernhardson, Luca just restarted the cluster. Can you tell if the change we merged fixes this?

hdfs getconf now reports 32, and spinning up a spark repl with 8 cores per executor is able to get executors and run code. Looks to be working! I'll try it out with model training a little later but not expecting any problems.

Ottomata moved this task from Ready to Deploy to Done on the Analytics-Kanban board.Aug 29 2017, 4:57 PM

• Nuria closed this task as Resolved.Sep 12 2017, 9:19 PM

Cannot request more than 4 cores per spark executorClosed, ResolvedPublic1 Estimated Story PointsActions

Description

Details

Event Timeline

Cannot request more than 4 cores per spark executor
Closed, ResolvedPublic1 Estimated Story Points
Actions