(Related to T335721)
Yarn requires the definition of spark.yarn.archive, a zip or jar file containing all the jars from Spark. This is needed so that we don't loose time at the beginning of every Spark job uploading all of its jars.
We have a manual solution for Spark3 at https://gitlab.wikimedia.org/repos/data-engineering/conda-analytics/-/blob/main/generate_spark_assembly.sh.
We also have an automated solution for this for Spark2 here. However, this solution does not work with our new way of deploying Spark via pyspark.
In this task we should reconcile these two approaches and make it automated for Spark3 as well.