Page MenuHomePhabricator

DSE kubernetes namespace for llm-inference
Closed, ResolvedPublic

Description

Research wants to deploy workloads on the DSE cluster

  • Examples include: postgres db for state that requires random access, vector database for research use case, poc endpoints (api not fit for liftwing, internal data exploration ui, using kafka events), llm inference on gpu
  • Initially the namespace will be used internally for testing/development, hopefully/presumably there will be dedicated namespaces for shared infrastructure (e.g. database-as-a service, ml inference/training on gpu)

Event Timeline

BTullis added subscribers: Gehel, odimitrijevic, BTullis.

I'm tagging some people and projects for visibility and approval.

I think this is a great idea in terms of self-service, but maybe we should try to split your examples out into separate requests at an early stage.
Maybe just having one research namespace would become unmanageable if you are experimenting with different technologies with it concurrently.

We could perhaps start with an llm-inference namespace?
If you can expand on the postgres use-case, that might help too.

We currently deploy one postgresql cluster per application and these are currently limited to Airflow instances.
It would be great to understand how much data you would be intending to use in postgresql and what the expected usage pattern would be.

XiaoXiao-WMF changed the task status from Open to In Progress.Nov 20 2024, 3:10 PM
XiaoXiao-WMF assigned this task to fkaelin.
XiaoXiao-WMF set Due Date to Fri, Dec 20, 5:00 AM.

Picking this back up. Thanks for the background Ben.

In this case I agree that starting with an llm-inference namespace makes most sense, especially as this is also our most important active use case. As part of SDS 1.2.1 (Test existing AI models for internal use-cases) we are running tests on the GPUs installed on the new ml-labs instances. We are facing limitations in what we can install/run due to the lack of docker, and it would be helpful and informative to run these llm inference workloads on the "untapped" MI210 GPUs on the DSE cluster.

We can publish a docker image to run the workload to the wmf registry, but it would great to get some hands-on help to setup the helm charts and review/deployment steps, as this is research's first namespace and also requires the provisioning of a gpu (for which I hope/expect we can lean on existing charts for the ml-serve cluster). What are the next steps for this?

Assigning to myself to pick this up for now.

BTullis renamed this task from DSE kubernetes namespace for Research to DSE kubernetes namespace for llm-inference.Tue, Dec 10, 4:42 PM

Change #1102284 had a related patch set uploaded (by Btullis; author: Btullis):

[operations/deployment-charts@master] dse-k8s: Add a namespace for llm-inference work by the ML team

https://gerrit.wikimedia.org/r/1102284

Change #1102287 had a related patch set uploaded (by Btullis; author: Btullis):

[operations/puppet@production] dse-k8s: Add token for the llm-inference namespace

https://gerrit.wikimedia.org/r/1102287

Change #1102287 merged by Btullis:

[operations/puppet@production] dse-k8s: Add tokens for the llm-inference namespace

https://gerrit.wikimedia.org/r/1102287

Change #1102284 merged by jenkins-bot:

[operations/deployment-charts@master] dse-k8s: Add a namespace for llm-inference work by the ML team

https://gerrit.wikimedia.org/r/1102284

BTullis added subscribers: MunizaA, gmodena.

This is now ready for use.
Currently, three users are permitted to access this namespace, by virtue of being members of the research-deployers group.

btullis@deploy2002:~$ getent group research-deployers
research-deployers:x:835:fab,gmodena,mnz

That's @fkaelin and @gmodena and @MunizaA - If you would like to modify this access list, let me know, or create a ticket.

The next step is probably to start to create a helm chart for the new llm-inference work that you would like to do.

If you let us know what sort of processes, inputs and outpus you expect from the work, then we can likely help you to make a start here. This will be a lot more manageable than simply using kubectl to deploy resources into the namespace.
In the meantime, I'll close this ticket, if that's OK. Feel free to tag us on any follow-ups and reach out if you would like assistance getting started.