Currently, spark.run (and the draft spark.load_parquet) set a Spark session timeout before returning. This is to avoid application master processes from persisting for a long time when they're not being used. However, this system doesn't work perfectly:
- The code is pretty complex and makes it harder to write code for the spark module.
- We don't set timeouts on custom sessions (and clear any timeout whenever get_session or get_custom_session is called), because we can't track when those sessions are used. But this also means that a user could potentially get a session, save the handle in a variable, begin using it, use run, go back to using the session directly without calling get_session again, and then unexpectedly have their session die in the middle of using it because of the timeout set by run.
I suggested we simply remove timeouts entirely. This would mean there would be some more reserved application masters staying idle for a long time, but the resource consumption there (1 executor) is pretty tiny in comparison to what actually gets used when an application is run (10s to 100s of executors). If we're concerned about resource use, there's a much better place to control that: at the Jupyter kernel level. If the kernel gets shut down, the driver and application master go with it, and this neatly frees up the resources used by the kernel itself too.