Page MenuHomePhabricator

ml-lab: Update pytorch-rocm to a current ROCm-compatible release
Open, Needs TriagePublic

Description

Summary

pytorch-rocm on ml-lab1002.eqiad.wmnet is outdated. Update it to a current ROCm-compatible release so MI210 inference experiments have access to recent model support and bug fixes. We should upgrade the environment which lives under /srv/pytorch-rocm/. In the process of doing so we can explore the option of installing also vllm with one of the pre-built wheels that are mentioned in documentation https://docs.vllm.ai/en/latest/getting_started/installation/gpu/#create-a-new-python-environment. From a quick pass in these docs both gfx90a (MI210) and gfx942 (MI300X) should be supported.

Acceptance criteria