In T338057 we discussed performing a full migration of our production Spark deployment from its current version 3.1.2 to either 3.3.x or 3.4.x, complete with the yarm shuffler service.
This work was proposed in order to support ongoing development work on the T333013: [Iceberg Migration] Apache Iceberg Migration and the Dumps 2.0 work in T330296
However, in T340861 @xcollazo was able to validate a workaround which meant that we could use a different version of spark from that deployed for production jobs, by supplying a custom conda environment.
The most significant remaining challenge for this method is that our YARN resource managers only have one spark shuffler service running, which is that of Spark version 3.1.2
However, it is possible to run multiple version of the spark shuffler service for YARN in parallel:
https://spark.apache.org/docs/latest/running-on-yarn.html#running-multiple-versions-of-the-spark-shuffle-service
...and select which one to use at the time of a spark job submission.
This seems like it would be a useful mechanism for us to employ, to help speed up development work.
In addition, it will likely make future upgrades of the production Spark version easier too, since we would not have to coordinate code changes to the spark jobs with a big-bang upgrade of the shuffler service.
Done is:
- Spark 3.3.2 Shuffle service available in the cluster
- Spark 3.4.1 Shuffle service available in the cluster
Corresponding spark assembly files available in cluster ( T345440 will take care of this )