Page MenuHomePhabricator

Ensure --master=yarn is default in stat host users' spark configs
Closed, ResolvedPublic

Description

We have seen occurrences whereby users inadvertently omit the --master=yarn setting when launching spark jobs on stat hosts.

The effect of this is that the spark driver and executors are not configured to run on YARN, but instead spawn on the local stat host.

This may be one of the causes of stat host instability, as users expect their spark jobs to be running on a dedicated compute cluster, rather than on the local machine.

Event Timeline

bking triaged this task as Medium priority.
BTullis subscribed.

Expediting this work, as the use of spark in local mode has been identified as having been linked to some recent stat server instability.

Change #1164272 had a related patch set uploaded (by Btullis; author: Btullis):

[operations/puppet@production] Ensure that master=yarn is the default spark configuration for users

https://gerrit.wikimedia.org/r/1164272

Change #1164272 merged by Btullis:

[operations/puppet@production] Ensure that master=yarn is the default spark configuration for users

https://gerrit.wikimedia.org/r/1164272

nshahquinn-wmf subscribed.

We realized that local sessions are still the default if you're using Wmfdata's spark.create_custom_session (although that's not the case if you're using another one of the methods in that module).

@BTullis do you want to put up an MR? I'll be happy to review it 😊

I've put up the MR and requested review from @xcollazo (since @BTullis will be out of office for the next week and a bit).

The Wmfdata MR has been merged.