2024 Q4 Goal: An HuggingFace 7B LLM is hosted on ml-staging on Lift Wing powered by GPU
Open, Needs TriagePublic
Actions

Assigned To

None

Authored By

	calbon
	Apr 16 2024, 2:45 PM

Related Objects
Search...

Status	Subtype	Assigned	Task
Open		None	T362670 2024 Q4 Goal: An HuggingFace 7B LLM is hosted on ml-staging on Lift Wing powered by GPU
Resolved		isarantopoulos	T357986 Use Huggingface model server image for HF LLMs
Resolved	BUG REPORT	elukey	T362984 GPU errors in hf image in ml-staging
Resolved		elukey	T363191 Test if we can avoid ROCm debian packages on k8s nodes
Resolved		isarantopoulos	T365246 Upgrade Huggingface image to kserve 0.13-rc0 (torch 2.3.0 ROCm 6.0)
Resolved		isarantopoulos	T365166 Update Pytorch base image to 2.3.0

Event Timeline

calbon created this task.Apr 16 2024, 2:45 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptApr 16 2024, 2:45 PM

GPU order for the first GPU 2x chassis is close to complete. There are some supply issues with the chassis, so the question is going to be if we want to use an upgraded chassis for the ml-staging server.

isarantopoulos added a subtask: T357986: Use Huggingface model server image for HF LLMs.Apr 24 2024, 12:17 PM

isarantopoulos added a subtask: T363191: Test if we can avoid ROCm debian packages on k8s nodes.Apr 24 2024, 3:52 PM

isarantopoulos added a subtask: T362984: GPU errors in hf image in ml-staging.

Update: We have Mistral-7b-instruct hosted on ml-staging that uses a CPU and is using the pytorch base image that we have created. A simple request takes approx 30s (haven't run extensive tests yet).
We are facing some issues using the GPU with this docker image at the moment as documented in T362984: GPU errors in hf image in ml-staging.

Update: No update

Decision point: Do we upgrade ROCm drivers?

Aiko is getting up to speed with how HF set up the interference endpoints and maybe can do adapt into our own HF server.

We have a theory that the ROCm drivers on the debian package is not required.

Update:

Wait for vendor (Supermicro) to finalize order of 2x for ml-staging.
Chris's guess is ml-staging installed at end of quarter

Update:

As part of the task T362984: GPU errors in hf image in ml-staging we have also experimented with different versions of pytorch (2.2.1, 2.3.0) and rocm (5.6, 5.7, 6.0) and we are still hitting the same issue.
To clarify the GPU works properly with pytorch 2.0.1 and rocm 5.4.2 but these versions are too old to be used with the huggingfaceserver.

Update:

Still can't use GPU with ROCm. But we figured out what the bug is - if the control version is upgraded to Bookworm it will be fixed.
Next step is to upgrade ml-staging to Bookworm then test.
Working on upgrading HF with newer versions with ROCm 6.0. Tested them and they work and will be posting watch.
Goal is to utilize GPU so we can deploy models from HuggingFace.

isarantopoulos added a subtask: T365246: Upgrade Huggingface image to kserve 0.13-rc0 (torch 2.3.0 ROCm 6.0).May 24 2024, 4:47 PM

isarantopoulos added a subtask: T365166: Update Pytorch base image to 2.3.0.

Mistral crashlooping, startup checks usually 5m , so we bumped to 10m, but it didn't help
Bert model works, so likely Mistral issue
the kubelet partition increase for the install phase is in review
ml-staging1001 is now on Bookworm, dragonfly (distributed downloading of S3 stuff) needs to be bumped
with bookworm, there no longer are GPU drivers on the base node (besides Debian kernel support), but driver/library code lives in the Docker images

isarantopoulos closed subtask T365246: Upgrade Huggingface image to kserve 0.13-rc0 (torch 2.3.0 ROCm 6.0) as Resolved.Jun 19 2024, 1:00 PM

elukey closed subtask T362984: GPU errors in hf image in ml-staging as Resolved.Jun 19 2024, 2:06 PM

elukey closed subtask T363191: Test if we can avoid ROCm debian packages on k8s nodes as Resolved.

elukey reopened subtask T363191: Test if we can avoid ROCm debian packages on k8s nodes as Open.Jun 25 2024, 7:47 AM

elukey closed subtask T363191: Test if we can avoid ROCm debian packages on k8s nodes as Resolved.Jun 25 2024, 7:50 AM

isarantopoulos closed subtask T357986: Use Huggingface model server image for HF LLMs as Resolved.Tue, Jul 16, 1:45 PM

2024 Q4 Goal: An HuggingFace 7B LLM is hosted on ml-staging on Lift Wing powered by GPUOpen, Needs TriagePublicActions

Related ObjectsSearch...

Event Timeline

2024 Q4 Goal: An HuggingFace 7B LLM is hosted on ml-staging on Lift Wing powered by GPU
Open, Needs TriagePublic
Actions

Related Objects
Search...