Page MenuHomePhabricator

DSE Experiment - User Story 4 (Machine Learning Use Case)
Closed, DeclinedPublic

Description

User Story

As a Wikimedia machine learning engineer or researcher, I want to be able to develop or train a machine learning model on the Data Science and Engineering Kubernetes Cluster using Kubeflow, so that I can rapidly iterate in model development.

Acceptance Criteria

  • The engineer or researcher should be able to develop or train a machine learning model on the Data Science and Engineering Kubernetes Cluster using Kubeflow and Ceph.
  • The engineer or researcher should be able to iterate rapidly in model development.
  • The Kubeflow should provide the necessary tools and functionalities to make model development easy and efficient.

Outstanding Questions:

  • Which components of Kubeflow are required vs. optional for us? e.g.
    • Jupyter Notebooks
    • Pipelines
    • Distributed training (Pytorch, tf-jobs)
    • Central UI Dashboard
  • Can we run Jupyter Notebook that uses data from HDFS?

Event Timeline

BTullis subscribed.

For now, we are not going to be working on kubeflow for the dse-k8s cluster. We may revisit this decision in future.