Page MenuHomePhabricator

DSE Experiment - User Story 2 (Make Compute available)
Closed, DuplicatePublic

Description

User Story

As a Wikimedia researcher, data scientist, or engineer, I want to be able to launch Spark jobs on the Data Science and Engineering Kubernetes Cluster from a UI or CLI, so that I can easily run experiments on large datasets stored in HDFS or an object storage system.

Acceptance Criteria

  • The user should be able to access run spark jobs on the cluster through Kerberized services (such as HDFS) from Kubernetes.
  • The user should be able to run Spark jobs in interactive (REPL) mode.
  • The user should be able to manipulate files stored in HDFS or object storage system as part of their experiments.
  • The user should be able to do this securely and with minimal configuration.

Outstanding Questions:

  • How would we manage resource contention?
  • Can we access GPU hardware from a Spark job on the DSE Cluster?

Related Objects

Event Timeline