After a Spark crash (such as those caused by T245896), a Spark application can be left in a defective state that blocks a new session from being created. Some of its functions that do not involve data processing (such as returning its display representation or passing a query to .sql) might complete without an error; however, anything that does involve data processing (like .collect or .toPandas) returns with an immediate error. Examples include
java.lang.IllegalStateException: Cannot call methods on a stopped SparkContext
Py4JJavaError: An error occurred while calling o638.sql. : java.lang.NullPointerException
In some cases, the ApplicationMaster continues to run; running a new query causes the application to start a new job and keep its allocated executors, but never make any progress. For example:
In this case, explicitly calling .stop on the SparkSession causes a series of additional ApplicationMasters to be created before the application is finally stopped:
In some cases, retrieving the PySpark session object using wmfdata.spark.get_session and explicitly calling .stop on it allows a new, functioning session to be created. In other cases, however, the "new" session still has errors like the following:
Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext. : org.apache.spark.SparkException: Only one SparkContext may be running in this JVM (see SPARK-2243). To ignore this error, set spark.driver.allowMultipleContexts = true.