The main issue is that pip installing torch with rocm support brings in ~8.4GB of libs with ROCm version 5.4.0
This is how the space is used:
somebody@4b8595f6b014:/srv/revert_risk_model$ du -hs /opt/lib/python/site-packages/torch/lib/* | sort -h | tail 390M /opt/lib/python/site-packages/torch/lib/libMIOpen.so 497M /opt/lib/python/site-packages/torch/lib/libtorch_cpu.so 596M /opt/lib/python/site-packages/torch/lib/librocsparse.so 693M /opt/lib/python/site-packages/torch/lib/librocfft-device-0.so 696M /opt/lib/python/site-packages/torch/lib/librocfft-device-1.so 715M /opt/lib/python/site-packages/torch/lib/libtorch_hip.so 721M /opt/lib/python/site-packages/torch/lib/librocfft-device-2.so 764M /opt/lib/python/site-packages/torch/lib/librocfft-device-3.so 1.1G /opt/lib/python/site-packages/torch/lib/librocsolver.so 1.4G /opt/lib/python/site-packages/torch/lib/rocblas
In particular, the rocblas directory contains libs to support all AMD GPU cards available, something that we don't really need.
This task should investigate if it is possible to reduce the size of the package, maybe building it with our own infrastructure.
Useful links:
- How to build torch with ROCm support: https://lernapparat.de/pytorch-rocm
- Layers of the rocm/pytorch image (pay attention to the use of PYTORCH_ROCM_ARCH with multiple/selected archs).
- https://medium.com/@rafaelmanzanom/ditching-cuda-for-amd-rocm-for-more-accessible-llm-inference-ryzen-apus-edition-92c3649f8f7d