Instead of having to copy spark libraries to HDFS for every job, it's best practice to use a single spark-assembly jar stored on HDFS.
The jar can be built this way:
jar cv0f spark-3.1.2-assembly.zip -C /usr/lib/airflow/lib/python3.7/site-packages/pyspark/jars/ .
Then this file should be copied to HDFS if it's not already existing at path:
/user/spark/share/lib/spark-3.1.2-assembly.zip
Once this is done, we can update the spark3 configuration to reference the assembly:
https://github.com/wikimedia/puppet/blob/13dd484c4012d3c978ff7ccc244767adb5977610/modules/profile/templates/hadoop/spark3-defaults.conf.erb#L51
And finally remove the setting from the airflow config:
https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/blob/main/wmf_airflow_common/config/experimental_spark_3_dag_default_args.py#L27