Update big spark jobs conf with better settings
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	JAllemandou
	Jan 11 2019, 9:19 AM

Description

Following this article: https://towardsdatascience.com/how-does-facebook-tune-apache-spark-for-large-scale-workloads-3238ddda0830

Todo:

patch jobs
Write doc

Details

	Subject	Repo	Branch	Lines +/-
	Update big spark jobs settings	analytics/refinery/source	master	+45 -5

Customize query in gerrit

Event Timeline

JAllemandou created this task.Jan 11 2019, 9:19 AM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJan 11 2019, 9:19 AM

Change 482661 had a related patch set uploaded (by Joal; owner: Joal):
[analytics/refinery/source@master] Update big spark jobs settings

https://gerrit.wikimedia.org/r/482661

gerritbot added a project: Patch-For-Review.Jan 11 2019, 9:20 AM

JAllemandou moved this task from Next Up to In Progress on the Analytics-Kanban board.Jan 11 2019, 9:20 AM

• fdans triaged this task as High priority.Jan 14 2019, 4:28 PM

• fdans moved this task from Incoming to Operational Excellence on the Analytics board.

Doc available here: https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Spark#Spark_tuning_for_big_jobs

JAllemandou moved this task from In Progress to In Code Review on the Analytics-Kanban board.Feb 5 2019, 8:52 PM

JAllemandou moved this task from In Code Review to Ready to Deploy on the Analytics-Kanban board.Feb 7 2019, 8:07 PM

Change 482661 merged by jenkins-bot:
[analytics/refinery/source@master] Update big spark jobs settings

https://gerrit.wikimedia.org/r/482661

reviewed doc and fixed some spelling. I don't know what spill files are, but the rest made sense to me.

For the record @Milimetric : spilled files are the temporary files generated between steps when data doesn't fit in memory (they're called spilled because you first fill in memory, and it spills out to disk). For big jobs, those represent a lot of data and IOs.

JAllemandou moved this task from Ready to Deploy to Done on the Analytics-Kanban board.Feb 20 2019, 6:54 PM

Ottomata moved this task from Done to Ready to Deploy on the Analytics-Kanban board.Feb 21 2019, 5:07 PM

JAllemandou moved this task from Ready to Deploy to Done on the Analytics-Kanban board.Feb 28 2019, 10:01 AM

• Nuria closed this task as Resolved.Mar 18 2019, 4:14 PM

Change 482661 merged by Fdans:
[analytics/refinery/source@master] Update big spark jobs settings

https://gerrit.wikimedia.org/r/482661

Update big spark jobs conf with better settingsClosed, ResolvedPublicActions

Description

Details

Event Timeline

Update big spark jobs conf with better settings
Closed, ResolvedPublic
Actions