This ticket is closely aligned with T318535: Document ideas & investigation results from out spike with "Spark on k8s" [SPIKE - 1.5 Sprints] and forms part of an early, experimentation phase of T308317: Data Infrastructure as a Service MVP, in support of T302728: Analytics Platform Future State Planing.
We would like to be able to test the Spark on K8S operator on the DSE cluster: https://github.com/GoogleCloudPlatform/spark-on-k8s-operator
The intended outcome is to be able to execute a spark job as a normal user on a stat box using sparkctl create
https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/blob/master/sparkctl/README.md#create.
The nature of the spark job itself is not important at this stage. It could be stateless.
In future we will need to investigate both HDFS and Ceph storage back-ends capabilities.
Goal:
Run Spark K8 Operator on the DSE Cluster
Task:
- Make the spark-on-k8s operator packages/images available for use
- Add the spark-on-k8s operator privileged components to the dse-k8s cluster
- Add the sparkctl binary to the stat boxes
- Submit a spark job to the dse-k8s cluster
Outcomes:
- Can successfully launch a spark job on the dse-k8s cluster with sparkctl from a stat box and monitor/log its execution.