As an engineer,
I'd like to use the optimized rocm/vllm image on Lift Wing, so that I can take utilize the required software without having to build and maintain everything on our own.
After some work done in T370149: [LLM] Use vllm for ROCm in huggingface image , it seems that building and maintaining our own set of packages offer significant challenges and workload for us while at the same time doesn't seem like an ideal choice to use in production as the frequent changes in this ecosystem of packages makes us prone to errors and incompatibility among their dependencies.
The latest image (rocm/vllm:v710inference_rocm6.3-release_ubuntu22.04_py3.10_pytorch_release-2.6) is a great improvement in size as it is only 7.7GB compressed while the previous ones were >20GB. Here is it important to note that our base pytorch images are ~18GB. If the resulting image size doesn't go above 10GB it would offer a significant reduction in image sizes.
The image has the following software versions: (full pip freeze result)
| software | version |
| ROCm | 6.3 |
| torch | 2.6.0 (built from source for ROCm) |
| vllm | 0.6.5 (built from source for ROCm) |
| triton | 3.0.0 (built from source for ROCm) |
The dockerfiles for this image are defined in the ROCm/vllm repo . rocm-vllm dockerfile, rocm/vllm-dev. The rocm/vllm image uses rocm/vllm-dev as its base image.


