Page MenuHomePhabricator

Some worker nodes don't seem to have numpy installed
Closed, InvalidPublic


From stat1007 I'm trying to submit a spark script like so:

spark2-submit --master yarn --deploy-mode cluster --archives ~/refinery/artifacts/ --conf spark.pyspark.python=./venv/bin/python ~/refinery/oozie/article_recommender/

Notice that I'm also submitting my own virtual environment which has article-recommender installed. Then I'm using the Python in the virtual environment to run the file. Below you can see that the default '' file supplied by spark-submit doesn't contain numpy.

Traceback (most recent call last):
  File "", line 11, in <module>
    from article_recommender import recommend
  File "/var/lib/hadoop/data/f/yarn/local/usercache/bmansurov/appcache/application_1553166148342_1227/container_e06_1553166148342_1227_06_000001/venv/lib/python3.5/site-packages/article_recommender/", line 24, in <module>
    from import Pipeline
  File "/var/lib/hadoop/data/f/yarn/local/usercache/bmansurov/appcache/application_1553166148342_1227/container_e06_1553166148342_1227_06_000001/", line 22, in <module>
  File "/var/lib/hadoop/data/f/yarn/local/usercache/bmansurov/appcache/application_1553166148342_1227/container_e06_1553166148342_1227_06_000001/", line 24, in <module>
  File "/var/lib/hadoop/data/f/yarn/local/usercache/bmansurov/appcache/application_1553166148342_1227/container_e06_1553166148342_1227_06_000001/", line 26, in <module>
ImportError: No module named 'numpy'

The application ID is application_1553166148342_1227.

Is it true that numpy is not installed in that worker node or is the problem somewhere else?

Event Timeline

Never mind, I had to create the virtual environment with --system-site-packages.