Page MenuHomePhabricator

[LLM] Use vllm for ROCm in huggingface image
Closed, ResolvedPublic8 Estimated Story Points

Description

After the successful deployment of large transformer model from huggingface using the huggingface server available from kserve, we want to solve the issue of using an inference optimization engine with our MI 210 GPU.

MI210 seems to be supported by vllm (according to official docs) we are now in a place to test the vllm backend with huggingface.
Following the official docs we can explore 2 alternatives:

PYTORCH_ROCM_ARCH=gfx90a python setup.py install

where gfx90a is the architecture for the MI200 series, while the ROCm fork offers a different docker image as well.
I'm exploring which is the simplest solution and easier to maintain.
If we decide to use vllm engine as an inference framework we can move its installation to a base pytorch image. However for the time being I would avoid to add it over there as versions are chaning quite often.

One other thing to figure out is the version discrepancy between vllm(latest versions v0.4.3) and ROCm-vllm fork (latest version 0.4.0) which ends up being inconsistent with huggingfaceserver python module requirements - vllm = { version = "^0.4.2", optional = true }- which can be tackled if we use our fork of kserve but ends up being one more custom step in the build/update process. vllm releases has progressed to 0.5.2 which may be needed by huggingfaceserver which still requires v0.4.3

In a previous task we stumbled into issues while trying to install vllm: https://phabricator.wikimedia.org/T354870#9935109

We followed the process in the vllm docs to build from source based on our pytorch-rocm-base-image. All the examples and instructions we have found use python 3.9 while we are using a bookworm base image with python 3.11.
The instructions in the aforementioned link mention the following (after installing torch+rocm)

cd vllm
pip install -U -r requirements-rocm.txt
python setup.py install # This may take 5-10 minutes. Currently, `pip install .`` does not work for ROCm installation

I've modified the above to a version more friendly with blubber for testing. After cloning the repo in a separate command I use the following requirements.txt file:

-r /srv/app/vllm/requirements-rocm.txt
-e /srv/app/vllm/

However it seems that it is expecting CUDA_HOME to be set and fails.

  × Getting requirements to build wheel did not run successfully.
832.6   │ exit code: 1
832.6   ╰─> [20 lines of output]
832.6       Traceback (most recent call last):
832.6         File "/usr/local/lib/python3.11/dist-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 353, in <module>
832.6           main()
832.6         File "/usr/local/lib/python3.11/dist-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 335, in main
832.6           json_out['return_val'] = hook(**hook_input['kwargs'])
832.6                                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
832.6         File "/usr/local/lib/python3.11/dist-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 118, in get_requires_for_build_wheel
832.6           return hook(config_settings)
832.6                  ^^^^^^^^^^^^^^^^^^^^^
832.6         File "/tmp/pip-build-env-o4me9kcq/overlay/local/lib/python3.11/dist-packages/setuptools/build_meta.py", line 327, in get_requires_for_build_wheel
832.6           return self._get_build_requires(config_settings, requirements=[])
832.6                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
832.6         File "/tmp/pip-build-env-o4me9kcq/overlay/local/lib/python3.11/dist-packages/setuptools/build_meta.py", line 297, in _get_build_requires
832.6           self.run_setup()
832.6         File "/tmp/pip-build-env-o4me9kcq/overlay/local/lib/python3.11/dist-packages/setuptools/build_meta.py", line 313, in run_setup
832.6           exec(code, locals())
832.6         File "<string>", line 406, in <module>
832.6         File "<string>", line 312, in get_vllm_version
832.6         File "<string>", line 282, in get_nvcc_cuda_version
832.6       AssertionError: CUDA_HOME is not set
832.6       [end of output]
832.6   
832.6   note: This error originates from a subprocess, and is likely not a problem with pip.
832.6 error: subprocess-exited-with-error

The above process does seem to hacky so even if it works we'll have to make sure it is stable enough for us to use without being a real pain to maintain/update.

Expected outcome:
An expected outcome of this task would be to have a model (e.g. gemma2) using an image with the vllm engine which would be much faster than the current one which uses the huggingface backend

Event Timeline

isarantopoulos renamed this task from [LLM] Use vllm with rocm in huggingface image to Use vllm for ROCm in huggingface image .Jul 16 2024, 1:41 PM
isarantopoulos added a project: Lift-Wing.
isarantopoulos updated the task description. (Show Details)
isarantopoulos renamed this task from Use vllm for ROCm in huggingface image to [LLM] Use vllm for ROCm in huggingface image .Jul 22 2024, 4:17 PM

Change #1072194 had a related patch set uploaded (by Ilias Sarantopoulos; author: Ilias Sarantopoulos):

[operations/docker-images/production-images@master] (WIP) amd-pytorch: add vllm for ROCm to pytorch 2.3

https://gerrit.wikimedia.org/r/1072194

At the moment I'm trying to build vllm on top of the pytorch base image according to this piece of documentation.
A couple of days ago an official rocm/vllm image released in dockerhub but it is based on ubuntu and seems to be built for MI300 and python 3.9. The image is 22GB compressed!

My suggestion would be to build vllm in the pytorch base image in production images.
Building vllm with rocm seems quite unstable at the moment so building it there as they are intended to be used together and the version requirements are dependent on each other.
The other option would be to do it in the huggingface image via blubber (which I tried and failed) but blubber commands don't allow that much freedom (and rightly so).

I'm facing this issue while building vllm with the documentation instructions. It seems to be a permission related issue (the same one I faced while trying flash attention.

2024-09-12 13:48:29 [docker-pkg-build] INFO - Traceback (most recent call last):
 (drivers.py:106)
2024-09-12 13:48:29 [docker-pkg-build] INFO -   File "/home/somebody/vllm/setup.py", line 11, in <module>
 (drivers.py:106)
2024-09-12 13:48:29 [docker-pkg-build] INFO -     import torch
 (drivers.py:106)
2024-09-12 13:48:29 [docker-pkg-build] INFO -   File "/opt/lib/python/base-packages/torch/__init__.py", line 237, in <module>
 (drivers.py:106)
2024-09-12 13:48:29 [docker-pkg-build] INFO -     from torch._C import *  # noqa: F403
 (drivers.py:106)
2024-09-12 13:48:29 [docker-pkg-build] INFO -     ^^^^^^^^^^^^^^^^^^^^^^
 (drivers.py:106)
2024-09-12 13:48:29 [docker-pkg-build] INFO - ImportError: libamdhip64.so: cannot enable executable stack as shared object requires: Invalid argument
 (drivers.py:106)
2024-09-12 13:48:29 [docker-pkg-build] ERROR - Build command failed with exit code 1: The command '/bin/sh -c python3 ~/vllm/setup.py develop --prefix=/opt/lib/python/base-packages' returned a non-zero code: 1 (drivers.py:97)
2024-09-12 13:48:29 [docker-pkg-build] ERROR - Building image docker-registry.wikimedia.org/amd-pytorch23:2.3.0rocm6.0-4 failed - check your Dockerfile: Building image docker-registry.wikimedia.org/amd-pytorch23:2.3.0rocm6.0-4 failed (image.py:206)
Traceback (most recent call last):

Will try first as root to find out if the issue still exists and we can discuss permissions/users afterwards.

isarantopoulos set the point value for this task to 8.

I made an attempt to build the updated image exactly as defined in https://github.com/ROCm/vllm/blob/main/Dockerfile.rocm.
The idea is to build it using the ubuntu base and then switch it with debian bookworm.

At the moment I'm facing this error (after ~4.5h of build time!!). Full build log available in paste

ERROR: failed to solve: process "/bin/sh -c git clone ${PYTORCH_REPO} pytorch     && cd pytorch && git checkout ${PYTORCH_BRANCH} && git submodule update --init --recursive     && python tools/amd_build/build_amd.py     && CMAKE_PREFIX_PATH=$(python3 -c 'import sys; print(sys.prefix)') python3 setup.py bdist_wheel --dist-dir=dist     && pip install dist/*.whl     && cd ..     && git clone ${PYTORCH_VISION_REPO} vision     && cd vision && git checkout ${PYTORCH_VISION_BRANCH}     && python3 setup.py bdist_wheel --dist-dir=dist" did not complete successfully: exit code: 132
isarantopoulos raised the priority of this task from Medium to High.Nov 26 2024, 3:34 PM

I tried to follow the instructions here and also this ROCm/triton doc to build vllm from source. I installed the PyTorch and when building Triton flash attention for ROCm, I got this error. Full traceback in paste.

      File "/tmp/pip-build-env-ar2ty8ae/overlay/lib/python3.11/site-packages/setuptools/_distutils/dist.py", line 973, in run_command
        cmd_obj.run()
      File "<string>", line 361, in run
      File "<string>", line 473, in build_extension
      File "/usr/lib/python3.11/subprocess.py", line 413, in check_call
        raise CalledProcessError(retcode, cmd)
    subprocess.CalledProcessError: Command '['cmake', '--build', '.', '--config', 'TritonRelBuildWithAsserts', '-j192']' returned non-zero exit status 1.
    [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for triton

I successfully built Triton flash attention using a miniconda env. However, when attempting to build vllm from source, I encountered an error related to the environment variable HIP_ROOT_DIR:

(since my copy functionality in tmux isn't working, I paste a screenshot instead):

vllm_build_error.png (920×2 px, 1 MB)

I tried setting HIP_ROOT_DIR to both /opt/rocm and /opt/rocm/bin, but the same issue (HIP_ROOT_DIR not found or specified) persisted.

An update on the task is available in this paste

I successfully built Triton flash attention using a miniconda env. However, when attempting to build vllm from source, I encountered an error related to the environment variable HIP_ROOT_DIR:

(since my copy functionality in tmux isn't working, I paste a screenshot instead):

vllm_build_error.png (920×2 px, 1 MB)

I tried setting HIP_ROOT_DIR to both /opt/rocm and /opt/rocm/bin, but the same issue (HIP_ROOT_DIR not found or specified) persisted.

Since then ROCm installation on ml-lab has been fixed in T381567: Debian hipcc package conflicts with hipcc from AMD's ROCm repository so we can retry to build vllm from source.

waiting for the outcome of T385173: Use rocm/vllm image on Lift Wing in order to easier facilitate vllm usage

Change #1072194 abandoned by Ilias Sarantopoulos:

[operations/docker-images/production-images@master] (WIP) amd-pytorch: add vllm for ROCm to pytorch 2.3

https://gerrit.wikimedia.org/r/1072194

We are abandoning this in favor of the T385173: Use rocm/vllm image on Lift Wing . Instead of building custom wheels we will port the vllm image from ubuntu to a debian based one.

isarantopoulos changed the task status from Declined to Resolved.Apr 17 2025, 11:23 AM
isarantopoulos moved this task from Blocked to 2025-2026 Q1 Done on the Machine-Learning-Team board.