Page MenuHomePhabricator

POC for Running Spark on DSE
Open, In Progress, Needs TriagePublic


This ticket is closely aligned with T318535: Document ideas & investigation results from out spike with "Spark on k8s" [SPIKE - 1.5 Sprints] and forms part of an early, experimentation phase of T308317: Data Infrastructure as a Service MVP, in support of T302728: Analytics Platform Future State Planing.

We would like to be able to test the Spark on K8S operator on the DSE cluster:

The intended outcome is to be able to execute a spark job as a normal user on a stat box using sparkctl create

The nature of the spark job itself is not important at this stage. It could be stateless.
In future we will need to investigate both HDFS and Ceph storage back-ends capabilities.

Run Spark K8 Operator on the DSE Cluster


  • Make the spark-on-k8s operator packages/images available for use
  • Add the spark-on-k8s operator privileged components to the dse-k8s cluster
  • Add the sparkctl binary to the stat boxes
  • Submit a spark job to the dse-k8s cluster


  • Can successfully launch a spark job on the dse-k8s cluster with sparkctl from a stat box and monitor/log its execution.

Event Timeline

EChetty moved this task from Backlog to Investigate on the Foundational Technology Requests board.
EChetty changed the task status from Open to In Progress.Jan 18 2023, 11:56 AM

Removing inactive assignee (please do so as part of team offboarding!).