Problem: When we use Pytorch, most of the times libomp is used to parallelize some inference steps. The library doesn't recognize containers, and it uses the number of CPUs of the underlying k8s worker (via sysfs reading) to set the number of threads to use. The immediate consequence is that most of the times a container doesn't run with a ton of CPUs set, and the high number of threads trigger cpu throttling (causing high latency etc..).
The solution for the moment is to set the OMP_NUM_THREAD env variable, that instructs libomp how many threads to use, but it is not really flexible since it needs to be coupled to the CPU settings of the container. It will surely happen in the future that we increase the number of CPUs assigned to a container (in a pod) in k8s without adjusting OMP_NUM_THREADS as well, ending up in a lot of time wasted to figure out the source of the inconsistency.
We should try to find a way in the Python code to automatically set OMP_NUM_THREAD when needed.