Page MenuHomePhabricator

Q3 2024 Goal: Inference Optimization for Hugging face/Pytorch models
Open, Needs TriagePublic

Description

As an engineer,
I want to optimize the inference performance of transformers models using PyTorch on AMD GPUs (MI100), so that I can achieve faster model predictions with Large Language Models.
My goal is to identify and mitigate performance bottlenecks by leveraging techniques like quantization and efficient/smart batching and also to explore the boundaries of the specific GPU: what would be the largest model we can host and how fast does it run?
As part of this task we want to document the extent of the support for rocm by libraries that are used for inference optimization (examples like accelerate, bitsandbytes, vllm etc.) and narrow down the options that we have for AMD GPUs.

Resources: PyTorch Model Inference optimization checklist, Huggingface GPU Inference optimization

Event Timeline

isarantopoulos renamed this task from Goal: Inference Optimization for Hugging face models to Goal: Inference Optimization for Hugging face/Pytorch models.Dec 13 2023, 3:55 PM

We now have access to be able to do operations on running pods in ml-staging-codfw edit/exec/delete) so we can start working directly on the GPU.

Current status from relevant subtask
At the moment we are working on how to better serve 7B parameter models utilizing a GPU. We are using the huggingface runtime available by kserve.
We plan to experiment hosting larger models (>13B) and explore the tradeoffs between serving a 7B model vs 13B model (quantization or just downcasting) which would have similar serving times
If we remain in the area of 7B parameter models it is unlikely that we are going to utilize a vanilla version for any use case. However fine tuned versions of these LLMs may end up being good candidates for specific use cases.

calbon renamed this task from Goal: Inference Optimization for Hugging face/Pytorch models to Q3 2024 Goal: Inference Optimization for Hugging face/Pytorch models.Apr 16 2024, 2:51 PM