Following this article: https://towardsdatascience.com/how-does-facebook-tune-apache-spark-for-large-scale-workloads-3238ddda0830
Todo:
- patch jobs
- Write doc
Following this article: https://towardsdatascience.com/how-does-facebook-tune-apache-spark-for-large-scale-workloads-3238ddda0830
Todo:
Subject | Repo | Branch | Lines +/- | |
---|---|---|---|---|
Update big spark jobs settings | analytics/refinery/source | master | +45 -5 |
Change 482661 had a related patch set uploaded (by Joal; owner: Joal):
[analytics/refinery/source@master] Update big spark jobs settings
Change 482661 merged by jenkins-bot:
[analytics/refinery/source@master] Update big spark jobs settings
reviewed doc and fixed some spelling. I don't know what spill files are, but the rest made sense to me.
For the record @Milimetric : spilled files are the temporary files generated between steps when data doesn't fit in memory (they're called spilled because you first fill in memory, and it spills out to disk). For big jobs, those represent a lot of data and IOs.
Change 482661 merged by Fdans:
[analytics/refinery/source@master] Update big spark jobs settings