This ticket is closely aligned with T318535: Document ideas & investigation results from out spike with "Spark on k8s" [SPIKE - 1.5 Sprints] and forms part of an early, experimentation phase of T308317: Data Infrastructure as a Service MVP, in support of T302728: Analytics Platform Future State Planing.
We would like to be able to test the Spark on K8S operator on the DSE cluster: https://github.com/GoogleCloudPlatform/spark-on-k8s-operator
The intended outcome is to be able to execute a spark job as a normal user on a stat box using sparkctl create
The nature of the spark job itself is not important at this stage. It could be stateless.
In future we will need to investigate both HDFS and Ceph storage back-ends capabilities.
Run Spark K8 Operator on the DSE Cluster
- Make the spark-on-k8s operator packages/images available for use
- Add the spark-on-k8s operator privileged components to the dse-k8s cluster
- Add the sparkctl binary to the stat boxes
- Submit a spark job to the dse-k8s cluster
- Can successfully launch a spark job on the dse-k8s cluster with sparkctl from a stat box and monitor/log its execution.