We would like to use the thin client for Spark, in order to write data from an Elixir application. It may be possible to do this by starting a Spark Connect server in yarn.
Demonstrate that Spark Connect can be run on Analytics cluster
References:
https://wikitech.wikimedia.org/wiki/HTTP_proxy#Maven_proxy_configuration_example
Experimental commands:
source /opt/conda-analytics/etc/profile.d/conda.sh conda create -n spark34 python=3.10.8 pyspark=3.4.1 conda-pack=0.7.0 ipython jupyterlab=3.4.8 jupyterhub-singleuser=1.5.0 urllib3=1.26.11 conda activate spark34 pip install grpcio==1.48.1 protobuf grpcio-status wget https://archive.apache.org/dist/spark/spark-3.4.1/spark-3.4.1.tgz
Attempt to fetch the spark-connect jar and stuff it into hdfs. (Update: compiled with the wrong version of Java)
spark3-submit --conf spark.jars.ivySettings=/etc/maven/ivysettings.xml --master yarn --packages=org.apache.spark:spark-connect_2.12:3.4.1 org.apache.spark.sql.connect.service.SparkConnectServer hdfs dfs -put ~/.ivy2/jars/org.apache.spark_spark-connect_2.12-3.4.1.jar /user/awight/org.apache.spark_spark-connect_2.12-3.4.1.jar
Runs the service but eventually fails because of the version mismatch:
spark3-submit --master yarn --class org.apache.spark.sql.connect.service.SparkConnectServer hdfs:///user/awight/org.apache.spark_spark-connect_2.12-3.4.1.jar
error:
Exception in thread "main" java.lang.NoSuchMethodError: io.netty.buffer.PooledByteBufAllocator.<init>(ZIIIIIIZ)V
Trying to compile spark locally:
mvn package
Run Spark Connect for local development
TODO: not working yet
docker run -it --network spark --name spark-master -p 15002:15002 -p 8082:8080 spark:3.4.1-scala2.12-java11-ubuntu bash -c "/opt/spark/sbin/start-master.sh; tail -f /dev/null"
But this fails to override ivy cache config:
docker exec -it spark-master bash -c "../sbin/start-connect-server.sh -Divy.home=/tmp/ivy2 `pwd` --packages=org.apache.spark:spark-connect_2.12:3.4.1"
Write Elixir adapter
We'll also need the Elixir glue to call Spark Connect, WIP in https://gitlab.com/wmde/technical-wishes/apache_spark_connect_ex