In this task, our aim is to enable GPU usage for inference on the RevertRisk-Multilingual model. The objective is to perform load testing and compare the improvements in inference time with using only CPU for inference.
In task T355656, we implement batch inference for revertrisk-multilingual. Once these tasks are completed, we plan to test batch inference with a GPU.