Page MenuHomePhabricator

Some worker nodes don't seem to have numpy installed
Closed, InvalidPublic

Description

From stat1007 I'm trying to submit a spark script like so:

spark2-submit --master yarn --deploy-mode cluster --archives ~/refinery/artifacts/article-recommender-venv.zip#venv --conf spark.pyspark.python=./venv/bin/python ~/refinery/oozie/article_recommender/main.py

Notice that I'm also submitting my own virtual environment which has article-recommender installed. Then I'm using the Python in the virtual environment to run the main.py file. Below you can see that the default 'pyspark.zip' file supplied by spark-submit doesn't contain numpy.

Traceback (most recent call last):
  File "main.py", line 11, in <module>
    from article_recommender import recommend
  File "/var/lib/hadoop/data/f/yarn/local/usercache/bmansurov/appcache/application_1553166148342_1227/container_e06_1553166148342_1227_06_000001/venv/lib/python3.5/site-packages/article_recommender/recommend.py", line 24, in <module>
    from pyspark.ml import Pipeline
  File "/var/lib/hadoop/data/f/yarn/local/usercache/bmansurov/appcache/application_1553166148342_1227/container_e06_1553166148342_1227_06_000001/pyspark.zip/pyspark/ml/__init__.py", line 22, in <module>
  File "/var/lib/hadoop/data/f/yarn/local/usercache/bmansurov/appcache/application_1553166148342_1227/container_e06_1553166148342_1227_06_000001/pyspark.zip/pyspark/ml/base.py", line 24, in <module>
  File "/var/lib/hadoop/data/f/yarn/local/usercache/bmansurov/appcache/application_1553166148342_1227/container_e06_1553166148342_1227_06_000001/pyspark.zip/pyspark/ml/param/__init__.py", line 26, in <module>
ImportError: No module named 'numpy'

The application ID is application_1553166148342_1227.

Is it true that numpy is not installed in that worker node or is the problem somewhere else?

Event Timeline

bmansurov closed this task as Invalid.Mar 21 2019, 10:14 PM

Never mind, I had to create the virtual environment with --system-site-packages.