Upon connecting to the cluster with `SparkR` from **stat1005** the following behavior - possibly indicating that R is not installed on all workers - is observed:
**1. Connect (note: library(SparkR) is already loaded)**
```
sparkR.session(master = "yarn", appName = "SparkR", sparkHome = "/usr/lib/spark2/", sparkConfig = list(spark.driver.memory = "4g", spark.driver.cores = "1", spark.executor.memory = "2g"))
```
It seems Ok:
```
Spark package found in SPARK_HOME: /usr/lib/spark2/
Launching java with spark-submit command /usr/lib/spark2//bin/spark-submit --driver-memory "4g" sparkr-shell /tmp/RtmpPd5D4Z/backend_portb1d9abf347
Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF-8
Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF-8
18/04/17 10:24:12 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Java ref type org.apache.spark.sql.SparkSession id 1
```
**2. Do something (anything, in fact, will result in the same problem):**
```
df <- createDataFrame(iris)
Warning messages:
1: In FUN(X[[i]], ...) :
Use Sepal_Length instead of Sepal.Length as column name
2: In FUN(X[[i]], ...) :
Use Sepal_Width instead of Sepal.Width as column name
3: In FUN(X[[i]], ...) :
Use Petal_Length instead of Petal.Length as column name
4: In FUN(X[[i]], ...) :
Use Petal_Width instead of Petal.Width as column name
```
and here we go then:
```
head(filter(df, df$Sepal_length > 0))
```
This results in what is essentially reported as:
```18/04/17 10:28:25 WARN TaskSetManager: Lost task 0.0 in stage 1.0 (TID 1, analytics1045.eqiad.wmnet, executor 2): java.io.IOException: Cannot run program "Rscript": error=2, No such file or directory```
(verbose output is suppressed here).
Upon searching for a possible cause for some time, I have note that typically the first advise is to ask oneself whether R is installed on all worker nodes.
**NOTE.** The same happens upon starting a SparkR session as documented on [[ https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Spark | Analytics/Systems/Cluster/Spark ]]:
```
spark2R --master yarn --executor-memory 2G --executor-cores 1 --driver-memory 4G
```
Please advise.
**NOTE**: Since the above described problem was resolved, the ticket will be now used to report upon the results of SparkR tests.