In https://phabricator.wikimedia.org/T273711#6817104 we encountered a problem where we had old hadoop versions being used by Spark jobs, even though our cluster's hadoop versions had been upgraded. This lead to strange issues like https://www.irccloud.com/pastebin/qfy1lpD8/.
We should stop packaging hadoop dependencies with our installed spark distribution, and instead always use the cluster provided ones.
https://spark.apache.org/docs/latest/hadoop-provided.html
To do this, we need to rebuild the spark debian package based on the hadoop-less spark tarball and recreate and re-upload the spark-2.4.4-assembly.jar without hadoop.