In our Iceberg Working Session we ran out of time before discussing bumping Spark, however there was async support for it.
Starting with v1.4, Iceberg has dropped support for Spark 3.1, our current production version.
Options:
a) The Spark community released 3.4.0 on April 13 2023. Iceberg just released version 1.3.0 with support for Spark 3.4. This is the bleeding edge, but as with any .0 feature release there is risk of bugs on both Spark and Iceberg. We would have to bump Iceberg as well. We do win the longest runway. Update: Spark 3.4.1 is now available. Second update: Spark 3.5.0 is also now available.
b) The Spark community released 3.3.2 on Feb 17 2023. Iceberg has supported Spark 3.3 since 0.14.0. We already have Iceberg 1.2.1 which supports Spark 3.3, and the 3.3.2 is stable and well tested by now. We get a relatively shorter runway with this.
Whether we bump to 3.3, 3.4, or 3.5 line, we do win a bunch of perf improvements that will go well with T332765.
Migration guides:
https://spark.apache.org/docs/latest/sql-migration-guide.html#upgrading-from-spark-sql-31-to-32
https://spark.apache.org/docs/latest/sql-migration-guide.html#upgrading-from-spark-sql-32-to-33
https://spark.apache.org/docs/latest/sql-migration-guide.html#upgrading-from-spark-sql-33-to-34
https://spark.apache.org/docs/latest/sql-migration-guide.html#upgrading-from-spark-sql-34-to-35
Considering the migration guide does have breaking changes on syntax like ADD JAR and CSV output defaults (I originally thought there were none), it does seem like we should consider having the new spark version available jointly with the current version for a while. Perhaps by making it available as spark3_4-submit, etc?
In this task we should:
- Decide whether to bump to Spark 3.3.X, 3.4.X, or 3.5.X line.
- We will target version 3.5.3 (for now)
- Decide whether to remove current Spark 3.1.2, or to have it available at the same time for a while.
- We can't realistically do this at present
- Install it on test cluster. Do sanity tests.
- Install it on main cluster.
- Allow users to migrate their own jobs from Spark 3.1.2 to Spark 3.5.8
- Configure the use of Spark 3.5.8 by default