Page MenuHomePhabricator

Enable spark jobs on the dse-k8s cluster via the spark-operator
Open, In Progress, MediumPublic

Description

Update October 20205

We have installed the spark-operator on the dse-k8s-eqiad cluster and it can be used to execute an example job.
However, we now need to facilitate more meaningful testing by the Data-Engineering team, which means that we need spark to be able to reach out to data sources and sinks, with appropriate authentication, authorization, and monitoring capabilities.

Our goal is to have Airflow be able to launch spark jobs that run on the dse-k8s cluster, so we have shifted the focus away from regular users on stat servers, for the time being.

The sparkctl binary has been deprecated and dropped from recent versions of the spark-operator.

The spark-operator project itself has been adopted by kubeflow, so is now hosted at: https://github.com/kubeflow/spark-operator

Original ticket description follows:

This ticket is closely aligned with T318535: Document ideas & investigation results from out spike with "Spark on k8s" [SPIKE - 1.5 Sprints] and forms part of an early, experimentation phase of T308317: Data Infrastructure as a Service MVP, in support of T302728: Analytics Platform Future State Planing.

We would like to be able to test the Spark on K8S operator on the DSE cluster: https://github.com/GoogleCloudPlatform/spark-on-k8s-operator

The intended outcome is to be able to execute a spark job as a normal user on a stat box using sparkctl create
https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/blob/master/sparkctl/README.md#create.

The nature of the spark job itself is not important at this stage. It could be stateless.
In future we will need to investigate both HDFS and Ceph storage back-ends capabilities.

Goal:
Run Spark K8 Operator on the DSE Cluster

Task:

  • Make the spark-on-k8s operator packages/images available for use
  • Add the spark-on-k8s operator privileged components to the dse-k8s cluster
  • Add the sparkctl binary to the stat boxes
  • Submit a spark job to the dse-k8s cluster

Outcomes:

  • Can successfully launch a spark job on the dse-k8s cluster with sparkctl from a stat box and monitor/log its execution.

Related Objects

StatusSubtypeAssignedTask
In ProgressNone
ResolvedBTullis
DuplicateNone
ResolvedBTullis
DuplicateNone
ResolvedBTullis
ResolvedBTullis
ResolvedBTullis
ResolvedBTullis
DeclinedNone
ResolvedBTullis
ResolvedBTullis
ResolvedBTullis
Resolvedelukey
ResolvedBTullis
OpenBTullis
OpenNone
OpenNone
OpenNone

Event Timeline

EChetty removed the point value 10 for this task.Sep 29 2022, 12:57 PM
EChetty moved this task from Backlog to Investigate on the Foundational Technology Requests board.
EChetty changed the task status from Open to In Progress.Jan 18 2023, 11:56 AM

Removing inactive assignee (please do so as part of team offboarding!).

BTullis renamed this task from POC for Running Spark on DSE to Enable spark jobs on the dse-k8s cluster via the spark-operator.Jul 18 2023, 11:08 AM
BTullis triaged this task as Medium priority.
BTullis updated the task description. (Show Details)
BTullis removed a subscriber: EChetty.

Change #1110883 had a related patch set uploaded (by Btullis; author: Btullis):

[operations/deployment-charts@master] airflow: Allow specific task pods to access the kube-api

https://gerrit.wikimedia.org/r/1110883

Change #1110883 merged by jenkins-bot:

[operations/deployment-charts@master] airflow: Allow specific task pods to access the kube-api

https://gerrit.wikimedia.org/r/1110883

Change #1111206 had a related patch set uploaded (by Btullis; author: Btullis):

[operations/deployment-charts@master] airflow: Use the existing labels for kubernetes and spark operators

https://gerrit.wikimedia.org/r/1111206

Change #1111206 merged by jenkins-bot:

[operations/deployment-charts@master] airflow: Use the existing labels for kubernetes and spark operators

https://gerrit.wikimedia.org/r/1111206

Change #1111278 had a related patch set uploaded (by Btullis; author: Btullis):

[operations/deployment-charts@master] airflow: Add a separate networkpolicy for task-pods to access k8s API

https://gerrit.wikimedia.org/r/1111278

Change #1111278 merged by jenkins-bot:

[operations/deployment-charts@master] airflow: Add a separate networkpolicy for task-pods to access k8s API

https://gerrit.wikimedia.org/r/1111278

Aklapper changed the task status from In Progress to Open.Mar 22 2025, 7:24 AM

Resetting task status from "In Progress" to "Open" as this task has been "in progress" for more than two years.

BTullis changed the task status from Open to In Progress.Sep 17 2025, 5:33 PM
BTullis subscribed.

Resetting to "In Progress" as we are now planning to carry on working on this.

The target will change somewhat, so we will be less concerned with regular users being able to launch spark jobs from stat boxes, but more concerned with being able to launch spark jobs from Airflow tasks.

Change #1189279 had a related patch set uploaded (by Btullis; author: Btullis):

[operations/deployment-charts@master] Fix the webhook TLS configuration for the spark-operator

https://gerrit.wikimedia.org/r/1189279

Change #1189279 merged by jenkins-bot:

[operations/deployment-charts@master] Fix the webhook TLS configuration for the spark-operator

https://gerrit.wikimedia.org/r/1189279