Change Details

Sometimes, a Spark application launched by wmfdata.hive.run suffers some kind of error or crash. This can manifest as the following types of error: ``` Py4JJavaError: An error occurred while calling o708.collectToPython. : java.lang.OutOfMemoryError: Java heap space ``` ``` Py4JJavaError: An error occurred while calling o275.collectToPython. : org.apache.spark.SparkException: Job 3 cancelled because SparkContext was shut down ``` ``` Py4JJavaError: An error occurred while calling o160.collectToPython. : java.lang.OutOfMemoryError: GC overhead limit exceeded ``` In some but not all cases, these errors could be prevented by fixing T245097. This type of crash leaves the Spark session (or its Python representation—the difference isn't fully clear to me) in a defective state that blocks a new session from being created. Some of its functions that do not involve data processing (such as returning its display representation or passing a query to .sql) might complete without an error; however, anything that does involve data processing (like .collect or .toPandas) returns with an immediate error. Examples include: ``` java.lang.IllegalStateException: Cannot call methods on a stopped SparkContext ``` ``` Py4JJavaError: An error occurred while calling o638.sql. : java.lang.NullPointerException ``` In some cases, retrieving the PySpark session object using wmfdata.spark.get_session and explicitly calling .stop on it allows a new, functioning session to be created. In other cases, however, the "new" session still has errors like the following: ``` Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext. : org.apache.spark.SparkException: Only one SparkContext may be running in this JVM (see SPARK-2243). To ignore this error, set spark.driver.allowMultipleContexts = true. ```