We're going to upgrade the Revert Risk Language-agnostic model server docker images to KServe 0.11
Description
Details
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | elukey | T337213 Update to KServe 0.11 | |||
Resolved | achou | T347550 Upgrade Revert Risk Language-agnostic docker images to KServe 0.11 | |||
Open | BUG REPORT | None | T349844 Increased latencies with Kserve 0.11.1 (cgroups v2) | ||
Resolved | MunizaA | T350389 Upgrade xgboost in knowledge_integrity | |||
Open | isarantopoulos | T353461 Allow to set Catboost's threads in readability-liftwing |
Event Timeline
Change 964559 had a related patch set uploaded (by Elukey; author: Elukey):
[machinelearning/liftwing/inference-services@main] revert-risk: upgrade to KServe 0.11.1
Change 964559 merged by jenkins-bot:
[machinelearning/liftwing/inference-services@main] revert-risk: upgrade to KServe 0.11.1
Change 965057 had a related patch set uploaded (by AikoChou; author: AikoChou):
[operations/deployment-charts@master] ml-services: upgrade kserve to 0.11.1 for revertrisk
Change 965057 merged by jenkins-bot:
[operations/deployment-charts@master] ml-services: upgrade kserve to 0.11.1 for revertrisk
When testing the model in ml-staging, the following error was encountered, resulting in the pod being in a CrashLoopBackOff state:
Message: Traceback (most recent call last): File "/srv/revert-risk-model/model-server/model.py", line 6, in <module> import kserve File "/opt/lib/python/site-packages/kserve/__init__.py", line 18, in <module> from .model_server import ModelServer File "/opt/lib/python/site-packages/kserve/model_server.py", line 25, in <module> from ray import serve as rayserve File "/opt/lib/python/site-packages/ray/__init__.py", line 136, in <module> from ray._private.worker import ( # noqa: E402,F401 File "/opt/lib/python/site-packages/ray/_private/worker.py", line 50, in <module> import ray._private.parameter File "/opt/lib/python/site-packages/ray/_private/parameter.py", line 4, in <module> import pkg_resources ModuleNotFoundError: No module named 'pkg_resources'
The error No module named 'pkg_resources' typically indicates an issue with the installation or configuration of the setuptools package. I attempted to add python3-setuptools to the blubber file for the production image, and it resolved the problem.
Change 965094 had a related patch set uploaded (by AikoChou; author: AikoChou):
[machinelearning/liftwing/inference-services@main] revert-risk: add python3-setuptools to revertrisk-la blubber file
Change 965094 merged by jenkins-bot:
[machinelearning/liftwing/inference-services@main] revert-risk: add python3-setuptools to revertrisk-la blubber file
Change 965146 had a related patch set uploaded (by AikoChou; author: AikoChou):
[operations/deployment-charts@master] ml-services: update revertrisk-la docker image
Change 965146 merged by jenkins-bot:
[operations/deployment-charts@master] ml-services: update revertrisk-la docker image
Change 965532 had a related patch set uploaded (by AikoChou; author: AikoChou):
[operations/deployment-charts@master] ml-services: deploy a revertrisk-la that uses kserve 0.10 in staging
Change 965532 merged by jenkins-bot:
[operations/deployment-charts@master] ml-services: deploy a revertrisk-la that uses kserve 0.10 in staging
@achou found a regression in latency when load testing RR-LA with KServe 0.11 on ml-staging. After some digging, we found out that the Python process running KServe 0.11 runs way more threads than before (~200 vs ~10), and that it uses way more CPU time than before (ending up in severe throttling by Kubernetes). The majority of the time spent on CPU for the new threads seems to be libgomp (OpenMP lib), used by XGBoost (brought in by the Knowledge Integrity package).
From a quick check on RR-LA with Kserve 0.10 we didn't see a change in XGBoost's or libgomp's version, but the current theory is that some change (likely a dependency) triggered more parallelism that in turn caused CPU usage and throttling.
Afaics from the KI Code, we use Xgboost's DMatrix:
https://gitlab.wikimedia.org/repos/research/knowledge_integrity/-/blob/main/knowledge_integrity/models/revertrisk.py#L103
From the Python docs I see the following:
thread (integer, optional) – Number of threads to use for loading data when parallelization is applicable. If -1, uses maximum threads available on the system.
My impression is that for some reason now Xgboost gets the number of cpu cores available on the bare metal k8s node (not the container) and creates many threads to run on it.
My theory may not be correct, I see https://github.com/dmlc/xgboost/pull/7654 that should be included in XGBoost 1.6+, and we have 1.7.6 afaics from KI.
After re-reading https://github.com/dmlc/xgboost/issues/7653 I am wondering if there is a difference in setting nthreads=-1 in our use case.
Answer: It seems that -1 is used when we don't specify any value.
I found https://github.com/dmlc/xgboost/pull/9651, released 2 days ago, that is what it would work in our use case. The code that gets the max number of CPUs that a container offers (represented by the cgroup) is not compatible with what we use now (cgroups v2).
The fix should be included in xgboost 2.0.1 (not yet released), that is a big jump for KI probably :(
Remaining to understand: why did we see this change in behavior from Xgboost?
Change 965666 had a related patch set uploaded (by AikoChou; author: AikoChou):
[operations/deployment-charts@master] ml-services: set OMP_NUM_THREADS for revertrisk-la
Change 965666 merged by jenkins-bot:
[operations/deployment-charts@master] ml-services: set OMP_NUM_THREADS for revertrisk-la
Change 967442 had a related patch set uploaded (by Elukey; author: Elukey):
[operations/deployment-charts@master] ml-services: set OMP_NUM_THREADS in all revertrisk isvc
Change 967442 merged by Elukey:
[operations/deployment-charts@master] ml-services: set OMP_NUM_THREADS in all revertrisk isvc
cgroup v2 support is here in the latest xgboost patch release https://github.com/dmlc/xgboost/releases/tag/v2.0.1 !
There will need to be a change in knowledge-integrity dependencies as the current dependency specification for xgboost would not allow v 2.0.1 to be installed.
Change 975008 had a related patch set uploaded (by Ilias Sarantopoulos; author: Ilias Sarantopoulos):
[machinelearning/liftwing/inference-services@main] revert kserve upgrades
Change 975008 merged by jenkins-bot:
[machinelearning/liftwing/inference-services@main] revert kserve upgrades
Change 975205 had a related patch set uploaded (by Ilias Sarantopoulos; author: Ilias Sarantopoulos):
[operations/deployment-charts@master] ml-services: rollback xgboost/catboost models to kserve 0.10
Change 975274 had a related patch set uploaded (by AikoChou; author: AikoChou):
[machinelearning/liftwing/inference-services@main] revert-risk: upgrade Kserve 0.11.1 and knowledge integrity 0.5.0
Change 975274 merged by jenkins-bot:
[machinelearning/liftwing/inference-services@main] revert-risk: upgrade Kserve 0.11.1 and knowledge integrity 0.5.0
Change 975304 had a related patch set uploaded (by AikoChou; author: AikoChou):
[operations/deployment-charts@master] ml-services: update revertrisk-la image and model binary
Change 975304 merged by jenkins-bot:
[operations/deployment-charts@master] ml-services: update revertrisk-la image and model binary
Update:
The revertrisk-la image (kserve 0.11.1 & knowledge integrity v0.5.0) with model binary v3 has been deployed to staging. I ran some load tests and can confirm the latency issue has been fixed with xgboost 2.0.1. Therefore, there is no need to set the env var OMP_NUM_THREADS manually.
Latency for the model servers on grafana dashboard (green is the old one and yellow is the new model server)
Change 976748 had a related patch set uploaded (by Ilias Sarantopoulos; author: Ilias Sarantopoulos):
[operations/deployment-charts@master] ml-services: update docker images to latest versions
Change 975205 abandoned by Ilias Sarantopoulos:
[operations/deployment-charts@master] ml-services: rollback xgboost/catboost models to kserve 0.10
Reason:
abandoned in favor of https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/976748 which has the latest updates
Change 976748 merged by jenkins-bot:
[operations/deployment-charts@master] ml-services: update docker images to latest versions
Upgraded the server and run some load tests. Results are in line with past values
wrk -c 4 -t 2 --timeout 3s -s revertrisk.lua https://inference.svc.codfw.wmnet:30443/v1/models/revertrisk-language-agnostic:predict --header "Host: revertrisk-language-agnostic.revertrisk.wikimedia.org" --latency -- revertrisk.input thread 1 created logfile wrk_1.log created thread 2 created logfile wrk_2.log created Running 10s test @ https://inference.svc.codfw.wmnet:30443/v1/models/revertrisk-language-agnostic:predict 2 threads and 4 connections Thread Stats Avg Stdev Max +/- Stdev Latency 102.54ms 26.00ms 224.48ms 88.29% Req/Sec 15.38 5.19 30.00 53.51% Latency Distribution 50% 95.04ms 75% 104.36ms 90% 132.62ms 99% 212.92ms 299 requests in 10.01s, 109.63KB read Non-2xx or 3xx responses: 4 Requests/sec: 29.86 Transfer/sec: 10.95KB thread 1 made 151 requests and got 149 responses thread 2 made 152 requests and got 150 responses