Page MenuHomePhabricator

Deploy pipeline under DSE namespace
Closed, InvalidPublic

Description

With the completion of https://phabricator.wikimedia.org/T377266 which is for experimental purpose, we would like to deploy a selective number of pipeline onto DSE infra.

This would include the deployment of HTML work (which is part of moderator project) to collect event history for experimental purpose.

Details

Due Date
May 30 2025, 4:00 AM

Event Timeline

XiaoXiao-WMF updated the task description. (Show Details)

Weekly updates

  • confirmed that the namespace work as expected, including being able to allocate a AMD GPU to a pod
  • next is to build a docker image using a trusted gitlab runner and publish it to the wikimedia registry, which is required for running services on kubernetes. we decided to add the research-datasets repo to the list of trusted runners for that purpose and focus on llm inference workloads at first.

Weekly updates

  • Progress towards building a rocm wheel for fast attention using gitlab ci, using a custom gitlab runner. Resource challenges around runtime (+10h) and disk requirements (60GB).

Weekly updates:

  • the fast attention wheel can be reliably built by CI, and can be for building a docker image for the research-datasets and llmperf repos now.

Moving from the quarterly lane to in-progress as I'm closing the quarterly lane. Please set/update the deadline for the task.

fkaelin set Due Date to May 30 2025, 4:00 AM.Apr 7 2025, 4:29 PM

Closing this in favor of T396495, which will provide the scaffolding needed. Research can use this work as starting point when there is a specific service to deploy on the DSE cluster.