Set automatically libomp's num threads when using Pytorch
Open, Needs TriagePublic3 Estimated Story Points
Actions

Assigned To

Authored By

	elukey
	Mar 14 2024, 1:11 PM

Description

Problem: When we use Pytorch, most of the times libomp is used to parallelize some inference steps. The library doesn't recognize containers, and it uses the number of CPUs of the underlying k8s worker (via sysfs reading) to set the number of threads to use. The immediate consequence is that most of the times a container doesn't run with a ton of CPUs set, and the high number of threads trigger cpu throttling (causing high latency etc..).

The solution for the moment is to set the OMP_NUM_THREAD env variable, that instructs libomp how many threads to use, but it is not really flexible since it needs to be coupled to the CPU settings of the container. It will surely happen in the future that we increase the number of CPUs assigned to a container (in a pod) in k8s without adjusting OMP_NUM_THREADS as well, ending up in a lot of time wasted to figure out the source of the inconsistency.

We should try to find a way in the Python code to automatically set OMP_NUM_THREAD when needed.

Details

Subject	Repo	Branch	Lines +/-
ml-services: update RR ML/Wikidata's Docker images	operations/deployment-charts	master	+2 -8
Set most of the model servers to run a specific entrypoint.sh	machinelearning/liftwing/inference-services	main	+76 -16
Rename entrypoint.sh to ci_entrypoint.sh	machinelearning/liftwing/inference-services	main	+23 -23
ml-services: update Docker image for Readability	operations/deployment-charts	master	+1 -3
readability: add entrypoint to set environment variables	machinelearning/liftwing/inference-services	main	+27 -14
readability: set torch threads using get_cpu_count	machinelearning/liftwing/inference-services	main	+6 -0
resource_utils.py: add a function to automatically set OMP_NUM_THREADS	machinelearning/liftwing/inference-services	main	+36 -1

Customize query in gerrit

Related Objects

Mentioned In: rMLIS8111de23e959: Set most of the model servers to run a specific entrypoint.sh
rMLIS8b07d346df40: Rename entrypoint.sh to ci_entrypoint.sh
rMLIS04f101dd3820: readability: add entrypoint to set environment variables
rMLIS9577efd7b821: readability: set torch threads using get_cpu_count
rMLIS9d9f08c58844: resource_utils.py: add a function to automatically set OMP_NUM_THREADS
Mentioned Here: rOMWC1012398c1b09: Update patch set 2
rMEXT1011130c1a4f: Syncronize VisualEditor: 34f6e11..1bbfcf2

Event Timeline

elukey created this task.Mar 14 2024, 1:11 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMar 14 2024, 1:11 PM

Change 1011130 had a related patch set uploaded (by Elukey; author: Elukey):

[machinelearning/liftwing/inference-services@main] resource_utils.py: add a function to automatically set OMP_NUM_THREADS

https://gerrit.wikimedia.org/r/1011130

Change 1011131 had a related patch set uploaded (by Elukey; author: Elukey):

[machinelearning/liftwing/inference-services@main] readability: set automatically OMP_NUM_THREADS

https://gerrit.wikimedia.org/r/1011131

Change 1011130 merged by jenkins-bot:

[machinelearning/liftwing/inference-services@main] resource_utils.py: add a function to automatically set OMP_NUM_THREADS

https://gerrit.wikimedia.org/r/1011130

elukey mentioned this in rMLIS9d9f08c58844: resource_utils.py: add a function to automatically set OMP_NUM_THREADS.Mar 15 2024, 1:34 PM

The main issue with the approach of setting the OMP_NUM_THREADS variable is that IIUC it needs to be set before torch is imported/initialized, that is not 100% easy with the current code. For example, if we take readability's model server, we should restructure its code to call the set_omp_num_threads() help function before anything else.

After reading https://pytorch.org/docs/stable/notes/cpu_threading_torchscript_inference.html we decided to try the approach of setting torch's threads programmatically, way easier approach. The downside is that other libs, like numpy, do use OMP_NUM_THREADS so if the torch's solution works we may not have found something completely generic.

Change 1011131 merged by jenkins-bot:

[machinelearning/liftwing/inference-services@main] readability: set torch threads using get_cpu_count

https://gerrit.wikimedia.org/r/1011131

elukey mentioned this in rMLIS9577efd7b821: readability: set torch threads using get_cpu_count.Mar 15 2024, 5:16 PM

Maintenance_bot removed a project: Patch-For-Review.Mar 15 2024, 5:30 PM

Change rOMWC1012398c1b09 had a related patch set uploaded (by Elukey; author: Elukey):

[machinelearning/liftwing/inference-services@main] readability: add entrypoint to set environment variables

https://gerrit.wikimedia.org/r/1012398

gerritbot added a project: Patch-For-Review.Mar 18 2024, 3:33 PM

Tried setting the number of threads via torch's library directly in the code, but unfortunately it didn't work (at least with 1.13.0). Tried a different road with https://gerrit.wikimedia.org/r/1012398, that in theory is more generic and future proof.

calbon assigned this task to elukey.Mar 19 2024, 2:52 PM

calbon set the point value for this task to 3.

calbon moved this task from Unsorted to In Progress on the Machine-Learning-Team board.Mar 19 2024, 2:55 PM

Change rOMWC1012398c1b09 merged by jenkins-bot:

[machinelearning/liftwing/inference-services@main] readability: add entrypoint to set environment variables

https://gerrit.wikimedia.org/r/1012398

elukey mentioned this in rMLIS04f101dd3820: readability: add entrypoint to set environment variables.Mar 19 2024, 3:33 PM

Change 1012700 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/deployment-charts@master] ml-services: update Docker image for Readability

https://gerrit.wikimedia.org/r/1012700

Change 1012701 had a related patch set uploaded (by Elukey; author: Elukey):

[machinelearning/liftwing/inference-services@main] Rename entrypoint.sh to ci_entrypoint.sh

https://gerrit.wikimedia.org/r/1012701

Change 1012711 had a related patch set uploaded (by Elukey; author: Elukey):

[machinelearning/liftwing/inference-services@main] Set most of the model servers to run a specific entrypoint.sh

https://gerrit.wikimedia.org/r/1012711

Change 1012700 merged by Elukey:

[operations/deployment-charts@master] ml-services: update Docker image for Readability

https://gerrit.wikimedia.org/r/1012700

Change 1012701 merged by jenkins-bot:

[machinelearning/liftwing/inference-services@main] Rename entrypoint.sh to ci_entrypoint.sh

https://gerrit.wikimedia.org/r/1012701

elukey mentioned this in rMLIS8b07d346df40: Rename entrypoint.sh to ci_entrypoint.sh.Mar 20 2024, 1:59 PM

We decided to go for https://gerrit.wikimedia.org/r/c/machinelearning/liftwing/inference-services/+/1012711

The idea is to source common env variables from a bash file, and add another general purpose one to run the model server (as entrypoint in the blubber configs). This should allow to be future proof: by default we'll use the generic entrypoint.sh, and if a new model server will require a more custom/specific one, we'll just source the common bash file in there as well.

Next steps:

Deploy the new images to staging and verify that everything works as expected.
Rollout to prod.

Change 1012711 merged by jenkins-bot:

[machinelearning/liftwing/inference-services@main] Set most of the model servers to run a specific entrypoint.sh

https://gerrit.wikimedia.org/r/1012711

elukey mentioned this in rMLIS8111de23e959: Set most of the model servers to run a specific entrypoint.sh.Mar 20 2024, 4:52 PM

Maintenance_bot removed a project: Patch-For-Review.Mar 20 2024, 5:30 PM

Change #1017292 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/deployment-charts@master] ml-services: update RR ML/Wikidata's Docker images

https://gerrit.wikimedia.org/r/1017292

gerritbot added a project: Patch-For-Review.Apr 5 2024, 2:12 PM

Change #1017292 merged by Elukey:

[operations/deployment-charts@master] ml-services: update RR ML/Wikidata's Docker images

https://gerrit.wikimedia.org/r/1017292

Thanks to Aiko that fixed some issues with RR Wikidata and ML, the new code is now deployed to all the model servers that used to have OMP_NUM_THREADS explicitly stated in deployment-charts. The model servers work fine and their performance is good.

In the team meeting we agreed not to rollout the new entrypoint.sh change everywhere now, but to slowly and automatically do it when future deployments are needed.

elukey moved this task from In Progress to 2023-2024 Q4 Done on the Machine-Learning-Team board.Apr 9 2024, 3:53 PM

The new endpoint has been rolled out as part of the migration to the mw-int-ro endpoint, task done!

Maintenance_bot removed a project: Patch-For-Review.Fri, May 17, 1:30 PM

Set automatically libomp's num threads when using PytorchOpen, Needs TriagePublic3 Estimated Story PointsActions

Description

Details

Related Objects

Event Timeline

Set automatically libomp's num threads when using Pytorch
Open, Needs TriagePublic3 Estimated Story Points
Actions