DSE Experiment - User Story 4 (Machine Learning Use Case)
Closed, DeclinedPublic
Actions

Assigned To

None

Authored By

	• EChetty
	Jan 18 2023, 11:54 AM

Description

User Story

As a Wikimedia machine learning engineer or researcher, I want to be able to develop or train a machine learning model on the Data Science and Engineering Kubernetes Cluster using Kubeflow, so that I can rapidly iterate in model development.

Acceptance Criteria

The engineer or researcher should be able to develop or train a machine learning model on the Data Science and Engineering Kubernetes Cluster using Kubeflow and Ceph.
The engineer or researcher should be able to iterate rapidly in model development.
The Kubeflow should provide the necessary tools and functionalities to make model development easy and efficient.

Outstanding Questions:

Which components of Kubeflow are required vs. optional for us? e.g.
- Jupyter Notebooks
- Pipelines
- Distributed training (Pytorch, tf-jobs)
- Central UI Dashboard
Can we run Jupyter Notebook that uses data from HDFS?

Related Objects
Search...

		Status	Subtype	Assigned	Task
		Resolved		Gehel	T327267 Create a DSE Kubernetes cluster with support for persistent storage from Ceph
		Declined		None	T327262 DSE Experiment - User Story 4 (Machine Learning Use Case)

Event Timeline

• EChetty created this task.Jan 18 2023, 11:54 AM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJan 18 2023, 11:54 AM

• EChetty updated the task description. (Show Details)Jan 18 2023, 11:55 AM

• EChetty mentioned this in T327267: Create a DSE Kubernetes cluster with support for persistent storage from Ceph.Jan 18 2023, 12:10 PM

• EChetty added a parent task: T327267: Create a DSE Kubernetes cluster with support for persistent storage from Ceph.

• EChetty moved this task from Backlog to To be discussed on the Shared-Data-Infrastructure board.Jan 18 2023, 12:18 PM

• EChetty moved this task from To be discussed to EQ2 Kanban (Sprints 04-07) on the Shared-Data-Infrastructure board.Jan 23 2023, 1:31 PM

• EChetty edited projects, added Shared-Data-Infrastructure (EQ2 Kanban (Sprints 04-07)); removed Shared-Data-Infrastructure.

• EChetty moved this task from EQ2 Kanban (Sprints 04-07) to 2022-23 Q4 Wrap up on the Shared-Data-Infrastructure board.Feb 6 2023, 12:59 PM

• EChetty edited projects, added Shared-Data-Infrastructure (2022-23 Q4 Wrap up); removed Shared-Data-Infrastructure (EQ2 Kanban (Sprints 04-07)).

JArguello-WMF added a project: Epic.Feb 21 2023, 1:11 PM

JArguello-WMF edited projects, added Shared-Data-Infrastructure; removed Shared-Data-Infrastructure (2022-23 Q4 Wrap up).Mar 14 2023, 3:50 PM

JArguello-WMF moved this task from Backlog to Epics on the Shared-Data-Infrastructure board.

JArguello-WMF moved this task from Epics to To be discussed on the Shared-Data-Infrastructure board.Jun 29 2023, 1:42 PM

For now, we are not going to be working on kubeflow for the dse-k8s cluster. We may revisit this decision in future.

DSE Experiment - User Story 4 (Machine Learning Use Case)Closed, DeclinedPublicActions

Description

User Story

As a Wikimedia machine learning engineer or researcher, I want to be able to develop or train a machine learning model on the Data Science and Engineering Kubernetes Cluster using Kubeflow, so that I can rapidly iterate in model development.

Acceptance Criteria

Outstanding Questions:

Related ObjectsSearch...

Event Timeline

DSE Experiment - User Story 4 (Machine Learning Use Case)
Closed, DeclinedPublic
Actions

Related Objects
Search...