As an engineer,
I want to load models using int8 integers in the transformers library so that I can use bigger LLMs that don't fit in the GPUs VRAM (e.g. aya23-35B).
We want to explore the post-training quantization options available for ROCm using pytorch models. All experimentation should be done using the aya-expanse-8B and aya-expanse-32B models for now, as these are the ones we want to validate. Below is a screenshot from the huggingface docs which depicts the current status of the libraries and what is available for ROCm.
A great resource for model quantization on ROCm is also the ROCm AMD docs

