I'm getting a weird error using pyspark in Swap
I think it may be related to using a udf in my code. See T222253
The problem might be that pyarrow isn't installed on worker nodes, but it is available in the notebook
This stack overflow thread seems relevant. https://stackoverflow.com/questions/51084514/apply-function-per-group-in-pyspark-pandas-udf-no-module-named-pyarrow
Steps to Reproduce:
run the following jupyter notebook on notebook1004 through cell 28.
/user/nathante/notebooks/Bias_analysis_spark.ipynb
Actual Results:
Py4JJavaError: Import Error: no module named pyarrow
Expected Results:
Print the results of my query.