Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Open | None | T362670 2024 Q4 Goal: An HuggingFace 7B LLM is hosted on ml-staging on Lift Wing powered by GPU | |||
Open | isarantopoulos | T357986 Use Huggingface model server image for HF LLMs | |||
Open | BUG REPORT | None | T362984 GPU errors in hf image in ml-staging | ||
Open | None | T363191 Test if we can avoid ROCm debian packages on k8s nodes |
Event Timeline
Comment Actions
- GPU order for the first GPU 2x chassis is close to complete. There are some supply issues with the chassis, so the question is going to be if we want to use an upgraded chassis for the ml-staging server.
Comment Actions
Update: We have Mistral-7b-instruct hosted on ml-staging that uses a CPU and is using the pytorch base image that we have created. A simple request takes approx 30s (haven't run extensive tests yet).
We are facing some issues using the GPU with this docker image at the moment as documented in T362984: GPU errors in hf image in ml-staging.