User Story
As a Wikimedia researcher, data scientist, or engineer, I want to be able to launch Spark jobs on the Data Science and Engineering Kubernetes Cluster from a UI or CLI, so that I can easily run experiments on large datasets stored in HDFS or an object storage system.
Acceptance Criteria
- The user should be able to access run spark jobs on the cluster through Kerberized services (such as HDFS) from Kubernetes.
- The user should be able to run Spark jobs in interactive (REPL) mode.
- The user should be able to manipulate files stored in HDFS or object storage system as part of their experiments.
- The user should be able to do this securely and with minimal configuration.
Outstanding Questions:
- How would we manage resource contention?
- Can we access GPU hardware from a Spark job on the DSE Cluster?