Page MenuHomePhabricator

Enable spark jobs on the dse-k8s cluster via the spark-operator
Open, In Progress, MediumPublic

Description

This ticket is closely aligned with T318535: Document ideas & investigation results from out spike with "Spark on k8s" [SPIKE - 1.5 Sprints] and forms part of an early, experimentation phase of T308317: Data Infrastructure as a Service MVP, in support of T302728: Analytics Platform Future State Planing.

We would like to be able to test the Spark on K8S operator on the DSE cluster: https://github.com/GoogleCloudPlatform/spark-on-k8s-operator

The intended outcome is to be able to execute a spark job as a normal user on a stat box using sparkctl create
https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/blob/master/sparkctl/README.md#create.

The nature of the spark job itself is not important at this stage. It could be stateless.
In future we will need to investigate both HDFS and Ceph storage back-ends capabilities.

Goal:
Run Spark K8 Operator on the DSE Cluster

Task:

  • Make the spark-on-k8s operator packages/images available for use
  • Add the spark-on-k8s operator privileged components to the dse-k8s cluster
  • Add the sparkctl binary to the stat boxes
  • Submit a spark job to the dse-k8s cluster

Outcomes:

  • Can successfully launch a spark job on the dse-k8s cluster with sparkctl from a stat box and monitor/log its execution.

Related Objects

Event Timeline

EChetty moved this task from Backlog to Investigate on the Foundational Technology Requests board.
EChetty changed the task status from Open to In Progress.Jan 18 2023, 11:56 AM

Removing inactive assignee (please do so as part of team offboarding!).

BTullis renamed this task from POC for Running Spark on DSE to Enable spark jobs on the dse-k8s cluster via the spark-operator.Jul 18 2023, 11:08 AM
BTullis triaged this task as Medium priority.
BTullis updated the task description. (Show Details)
BTullis removed a subscriber: EChetty.