DSE Experiment - User Story 2 (Make Compute available)
Closed, DuplicatePublic
Actions

Assigned To

None

Authored By

	• EChetty
	Jan 18 2023, 11:48 AM

Description

User Story

As a Wikimedia researcher, data scientist, or engineer, I want to be able to launch Spark jobs on the Data Science and Engineering Kubernetes Cluster from a UI or CLI, so that I can easily run experiments on large datasets stored in HDFS or an object storage system.

Acceptance Criteria

The user should be able to access run spark jobs on the cluster through Kerberized services (such as HDFS) from Kubernetes.
The user should be able to run Spark jobs in interactive (REPL) mode.
The user should be able to manipulate files stored in HDFS or object storage system as part of their experiments.
The user should be able to do this securely and with minimal configuration.

Outstanding Questions:

How would we manage resource contention?
Can we access GPU hardware from a Spark job on the DSE Cluster?

Related Objects
Search...

Status	Assigned	Task
In Progress	None	T327267 Create a DSE Kubernetes cluster with support for persistent storage from Ceph
Duplicate	None	T327258 DSE Experiment - User Story 2 (Make Compute available)
In Progress	None	T318712 Enable spark jobs on the dse-k8s cluster via the spark-operator
Resolved	BTullis	T318730 Add spark and spark-operator images to operations/docker-images/production-images
Duplicate	None	T318923 Add the sparkctl binary to the stat boxes
Resolved	BTullis	T318924 Submit a spark job to the dse-k8s cluster
Duplicate	None	T318925 Getting the Metrics API (K8) functioning to support Auto Scaling
Resolved	BTullis	T318926 Deploy spark-operator to the dse-k8s cluster
Resolved	BTullis	T321686 Create namespaces and kubernetes users for spark-operator and for spark jobs
Resolved	BTullis	T322635 Define necessary RBAC rules for spark on dse-k8s cluster
Open	None	T332912 [dse-k8s] Provide common hive config for spark jobs
Open	None	T332913 [dse-k8s] Provide common spark config for spark jobs
In Progress	None	T332909 [dse-k8s] Provide common hadooop config for spark jobs
In Progress	None	T332908 [dse-k8s] Spark-deploy need to create secret object in spark namespace
Open	None	T331971 [dse-k8s] Deploy spark cli to submit jobs on DSE K8S cluster with K8S config
Resolved	BTullis	T327257 DSE Experiment - User Story 1 (Address Kerberos)
Resolved	BTullis	T330162 Research and test methods for accessing kerberized services from spark running on the DSE K8S cluster

Event Timeline

• EChetty created this task.Jan 18 2023, 11:48 AM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJan 18 2023, 11:48 AM

• EChetty mentioned this in T327267: Create a DSE Kubernetes cluster with support for persistent storage from Ceph.Jan 18 2023, 12:10 PM

• EChetty added a parent task: T327267: Create a DSE Kubernetes cluster with support for persistent storage from Ceph.

• EChetty added a subtask: T318712: Enable spark jobs on the dse-k8s cluster via the spark-operator.Jan 18 2023, 12:14 PM

• EChetty moved this task from Backlog to To be discussed on the Shared-Data-Infrastructure board.Jan 18 2023, 12:18 PM

• EChetty moved this task from To be discussed to EQ2 Kanban (Sprints 04-07) on the Shared-Data-Infrastructure board.Jan 23 2023, 1:31 PM

• EChetty edited projects, added Shared-Data-Infrastructure (EQ2 Kanban (Sprints 04-07)); removed Shared-Data-Infrastructure.

• EChetty moved this task from EQ2 Kanban (Sprints 04-07) to 2022-23 Q4 Wrap up on the Shared-Data-Infrastructure board.Feb 6 2023, 12:59 PM

• EChetty edited projects, added Shared-Data-Infrastructure (2022-23 Q4 Wrap up); removed Shared-Data-Infrastructure (EQ2 Kanban (Sprints 04-07)).

JArguello-WMF added a project: Epic.Feb 21 2023, 1:11 PM

JArguello-WMF edited projects, added Shared-Data-Infrastructure; removed Shared-Data-Infrastructure (2022-23 Q4 Wrap up).Mar 14 2023, 3:49 PM

JArguello-WMF moved this task from Backlog to Epics on the Shared-Data-Infrastructure board.

JArguello-WMF moved this task from Epics to To be discussed on the Shared-Data-Infrastructure board.Jun 29 2023, 1:42 PM

BTullis closed this task as a duplicate of T318712: Enable spark jobs on the dse-k8s cluster via the spark-operator.Jul 18 2023, 11:01 AM

DSE Experiment - User Story 2 (Make Compute available)Closed, DuplicatePublicActions

Description

User Story

As a Wikimedia researcher, data scientist, or engineer, I want to be able to launch Spark jobs on the Data Science and Engineering Kubernetes Cluster from a UI or CLI, so that I can easily run experiments on large datasets stored in HDFS or an object storage system.

Acceptance Criteria

Outstanding Questions:

Related ObjectsSearch...

Event Timeline

DSE Experiment - User Story 2 (Make Compute available)
Closed, DuplicatePublic
Actions

Related Objects
Search...