Page MenuHomePhabricator

Upgrade Revert Risk Multilingual docker images to KServe 0.11.2
Closed, ResolvedPublic3 Estimated Story Points

Description

We're going to upgrade the Revert Risk Multilingual model server docker images to KServe 0.11

Event Timeline

Change 964559 had a related patch set uploaded (by Elukey; author: Elukey):

[machinelearning/liftwing/inference-services@main] revert-risk: upgrade to KServe 0.11.1

https://gerrit.wikimedia.org/r/964559

Change 964559 merged by jenkins-bot:

[machinelearning/liftwing/inference-services@main] revert-risk: upgrade to KServe 0.11.1

https://gerrit.wikimedia.org/r/964559

Change 965057 had a related patch set uploaded (by AikoChou; author: AikoChou):

[operations/deployment-charts@master] ml-services: upgrade kserve to 0.11.1 for revertrisk

https://gerrit.wikimedia.org/r/965057

Change 965057 merged by jenkins-bot:

[operations/deployment-charts@master] ml-services: upgrade kserve to 0.11.1 for revertrisk

https://gerrit.wikimedia.org/r/965057

elukey moved this task from In Progress to Unsorted on the Machine-Learning-Team board.
elukey subscribed.

Change 967442 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/deployment-charts@master] ml-services: set OMP_NUM_THREADS in all revertrisk isvc

https://gerrit.wikimedia.org/r/967442

Change 967442 merged by Elukey:

[operations/deployment-charts@master] ml-services: set OMP_NUM_THREADS in all revertrisk isvc

https://gerrit.wikimedia.org/r/967442

isarantopoulos moved this task from Unsorted to In Progress on the Machine-Learning-Team board.

Ran some load tests

isaranto@deploy2002:~/pycharm/test/wrk$ wrk -c 1 -t 1 --timeout 5s -s revertrisk.lua https://inference-staging.svc.codfw.wmnet:30443/v1/models/revertrisk-multilingual:predict -d 60 --header "Host: revertrisk-multilingual.revertrisk.wikimedia.org" --latency -- revertrisk.input
thread 1 created logfile wrk_1.log created
Running 1m test @ https://inference-staging.svc.codfw.wmnet:30443/v1/models/revertrisk-multilingual:predict
  1 threads and 1 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     1.35s     1.20s    4.56s    81.82%
    Req/Sec     1.25      1.46     5.00     81.25%
  Latency Distribution
     50%  765.36ms
     75%    2.26s
     90%    2.88s
     99%    4.56s
  32 requests in 1.00m, 11.64KB read
  Socket errors: connect 0, read 0, write 0, timeout 1
  Non-2xx or 3xx responses: 1
Requests/sec:      0.53
Transfer/sec:     198.40B
thread 1 made 34 requests and got 32 responses

wrk -c 2 -t 2 --timeout 5s -s revertrisk.lua https://inference-staging.svc.codfw.wmnet:30443/v1/models/revertrisk-multilingual:predict -d 60 --header "Host: revertrisk-multilingual.revertrisk.wikimedia.org" --latency -- revertrisk.input
thread 1 created logfile wrk_1.log created
thread 2 created logfile wrk_2.log created
Running 1m test @ https://inference-staging.svc.codfw.wmnet:30443/v1/models/revertrisk-multilingual:predict
  2 threads and 2 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     1.52s   888.01ms   3.78s    65.91%
    Req/Sec     0.32      0.68     2.00     88.00%
  Latency Distribution
     50%    1.36s
     75%    2.06s
     90%    2.76s
     99%    3.78s
  50 requests in 1.00m, 18.22KB read
  Socket errors: connect 0, read 0, write 0, timeout 6
  Non-2xx or 3xx responses: 2
Requests/sec:      0.83
Transfer/sec:     310.39B
thread 1 made 27 requests and got 25 responses
thread 2 made 26 requests and got 25 responses

wrk -c 4 -t 4 --timeout 5s -s revertrisk.lua https://inference-staging.svc.codfw.wmnet:30443/v1/models/revertrisk-multilingual:predict -d 60 --header "Host: revertrisk-multilingual.revertrisk.wikimedia.org" --latency -- revertrisk.input
thread 1 created logfile wrk_1.log created
thread 2 created logfile wrk_2.log created
thread 3 created logfile wrk_3.log created
thread 4 created logfile wrk_4.log created
Running 1m test @ https://inference-staging.svc.codfw.wmnet:30443/v1/models/revertrisk-multilingual:predict
  4 threads and 4 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     2.07s   984.02ms   4.39s    63.41%
    Req/Sec     0.27      1.42    10.00     98.04%
  Latency Distribution
     50%    2.05s
     75%    2.86s
     90%    3.16s
     99%    4.39s
  51 requests in 1.00m, 18.65KB read
  Socket errors: connect 0, read 0, write 0, timeout 10
  Non-2xx or 3xx responses: 4
Requests/sec:      0.85
Transfer/sec:     317.72B
thread 1 made 15 requests and got 13 responses
thread 2 made 14 requests and got 13 responses
thread 3 made 14 requests and got 13 responses
thread 4 made 13 requests and got 12 responses

Results are comparable to the old load-tests done by Aiko.

Actually there are differences compared to old load tests even when I run the tests for 10s like the ones @achou ran in the link above.

wrk -c 1 -t 1 --timeout 5s -s revertrisk.lua https://inference-staging.svc.codfw.wmnet:30443/v1/models/revertrisk-multilingual:predict  --header "Host: revertrisk-multilingual.revertrisk.wikimedia.org" --latency -- revertrisk.input
thread 1 created logfile wrk_1.log created
Running 10s test @ https://inference-staging.svc.codfw.wmnet:30443/v1/models/revertrisk-multilingual:predict
  1 threads and 1 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   486.39ms  409.13ms   1.25s    70.00%
    Req/Sec     2.22      1.56     5.00     66.67%
  Latency Distribution
     50%  359.72ms
     75%  795.97ms
     90%    1.25s
     99%    1.25s
  10 requests in 10.01s, 3.65KB read
  Non-2xx or 3xx responses: 1
Requests/sec:      1.00
Transfer/sec:     373.66B
thread 1 made 12 requests and got 10 responses
wrk -c 2 -t 2 --timeout 5s -s revertrisk.lua https://inference-staging.svc.codfw.wmnet:30443/v1/models/revertrisk-multilingual:predict  --header "Host: revertrisk-multilingual.revertrisk.wikimedia.org" --latency -- revertrisk.input
thread 1 created logfile wrk_1.log created
thread 2 created logfile wrk_2.log created
Running 10s test @ https://inference-staging.svc.codfw.wmnet:30443/v1/models/revertrisk-multilingual:predict
  2 threads and 2 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     1.07s   717.63ms   2.79s    72.22%
    Req/Sec     1.65      3.22    10.00     88.24%
  Latency Distribution
     50%    1.11s
     75%    1.56s
     90%    1.97s
     99%    2.79s
  17 requests in 10.02s, 6.22KB read
  Non-2xx or 3xx responses: 2
Requests/sec:      1.70
Transfer/sec:     636.13B
thread 1 made 10 requests and got 8 responses
thread 2 made 10 requests and got 9 responses

I'm investigating if it is due to different input or something else

Change 968715 had a related patch set uploaded (by AikoChou; author: AikoChou):

[operations/deployment-charts@master] ml-services: set OMP_NUM_THREADS and OMP_THREAD_LIMIT in rr-multilingual

https://gerrit.wikimedia.org/r/968715

Change 968715 merged by jenkins-bot:

[operations/deployment-charts@master] ml-services: set OMP_NUM_THREADS and OMP_THREAD_LIMIT in rr-multilingual

https://gerrit.wikimedia.org/r/968715

Here are some load test results after setting OMP_NUM_THREADS and OMP_THREAD_LIMIT env vars. We found out that reverrisk-multilingual also uses catboost, which for some reasons created a large number of additional threads when using Kserve 0.11.

  • run 10s
aikochou@deploy2002:~/wrk/rr$ wrk -c 1 -t 1 --timeout 5s -s script.lua https://inference-staging.svc.codfw.wmnet:30443/v1/models/revertrisk-multilingual:predict --header "Host: revertrisk-multilingual.revertrisk.wikimedia.org" --latency -- input
thread 1 created logfile wrk_1.log created
Running 10s test @ https://inference-staging.svc.codfw.wmnet:30443/v1/models/revertrisk-multilingual:predict
  1 threads and 1 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   345.43ms  164.91ms 698.15ms   83.33%
    Req/Sec     2.69      1.65     5.00     53.85%
  Latency Distribution
     50%  374.78ms
     75%  484.13ms
     90%  499.09ms
     99%  698.15ms
  13 requests in 10.02s, 4.71KB read
  Socket errors: connect 0, read 0, write 0, timeout 1
Requests/sec:      1.30
Transfer/sec:     481.80B
thread 1 made 15 requests and got 13 responses
aikochou@deploy2002:~/wrk/rr$ wrk -c 2 -t 2 --timeout 5s -s script.lua https://inference-staging.svc.codfw.wmnet:30443/v1/models/revertrisk-multilingual:predict --header "Host: revertrisk-multilingual.revertrisk.wikimedia.org" --latency -- input
thread 1 created logfile wrk_1.log created
thread 2 created logfile wrk_2.log created
Running 10s test @ https://inference-staging.svc.codfw.wmnet:30443/v1/models/revertrisk-multilingual:predict
  2 threads and 2 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   508.37ms  164.28ms 791.71ms   60.00%
    Req/Sec     1.41      0.94     3.00     70.59%
  Latency Distribution
     50%  507.46ms
     75%  654.71ms
     90%  709.17ms
     99%  791.71ms
  17 requests in 10.02s, 6.16KB read
  Socket errors: connect 0, read 0, write 0, timeout 2
Requests/sec:      1.70
Transfer/sec:     629.79B
thread 1 made 11 requests and got 9 responses
thread 2 made 9 requests and got 8 responses
aikochou@deploy2002:~/wrk/rr$ wrk -c 4 -t 4 --timeout 5s -s script.lua https://inference-staging.svc.codfw.wmnet:30443/v1/models/revertrisk-multilingual:predict --header "Host: revertrisk-multilingual.revertrisk.wikimedia.org" --latency -- input
thread 1 created logfile wrk_1.log created
thread 2 created logfile wrk_2.log created
thread 3 created logfile wrk_3.log created
thread 4 created logfile wrk_4.log created
Running 10s test @ https://inference-staging.svc.codfw.wmnet:30443/v1/models/revertrisk-multilingual:predict
  4 threads and 4 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   884.78ms  288.42ms   1.65s    71.43%
    Req/Sec     0.75      0.65     2.00     53.57%
  Latency Distribution
     50%  872.10ms
     75%    1.02s 
     90%    1.29s 
     99%    1.65s 
  28 requests in 10.02s, 10.15KB read
Requests/sec:      2.79
Transfer/sec:      1.01KB
thread 1 made 9 requests and got 7 responses
thread 2 made 8 requests and got 7 responses
thread 3 made 8 requests and got 7 responses
thread 4 made 8 requests and got 7 responses
  • run 1 min
aikochou@deploy2002:~/wrk/rr$ wrk -c 1 -t 1 -d 1m --timeout 5s -s script.lua https://inference-staging.svc.codfw.wmnet:30443/v1/models/revertrisk-multilingual:predict --header "Host: revertrisk-multilingual.revertrisk.wikimedia.org" --latency -- input
thread 1 created logfile wrk_1.log created
Running 1m test @ https://inference-staging.svc.codfw.wmnet:30443/v1/models/revertrisk-multilingual:predict
  1 threads and 1 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   807.87ms  812.49ms   4.27s    86.17%
    Req/Sec     2.07      1.56     5.00     69.51%
  Latency Distribution
     50%  503.23ms
     75%  940.67ms
     90%    2.07s 
     99%    4.27s 
  82 requests in 1.00m, 29.73KB read
  Socket errors: connect 0, read 0, write 0, timeout 1
Requests/sec:      1.36
Transfer/sec:     506.64B
thread 1 made 84 requests and got 82 responses

Both avg latency and 99% latency showed better performance than the previous results done by @isarantopoulos.

Comparing with the old load-tests, the first run (1 thread, 1 connection) and second run (2 threads, 2 connections) performed slightly worse with more timeouts and fewer total requests. However, the third run (4 threads, 4 connections) showed comparable performance.

Change 968998 had a related patch set uploaded (by Ilias Sarantopoulos; author: Ilias Sarantopoulos):

[operations/deployment-charts@master] ml-services: remove old rr multilingual form staging

https://gerrit.wikimedia.org/r/968998

Change 968998 merged by jenkins-bot:

[operations/deployment-charts@master] ml-services: remove unused deployments from staging

https://gerrit.wikimedia.org/r/968998

calbon set the point value for this task to 3.Nov 2 2023, 7:07 PM
calbon moved this task from In Progress to Blocked on the Machine-Learning-Team board.
achou triaged this task as Medium priority.Nov 2 2023, 7:29 PM

Change 975008 had a related patch set uploaded (by Ilias Sarantopoulos; author: Ilias Sarantopoulos):

[machinelearning/liftwing/inference-services@main] revert kserve upgrades

https://gerrit.wikimedia.org/r/975008

Change 975008 merged by jenkins-bot:

[machinelearning/liftwing/inference-services@main] revert kserve upgrades

https://gerrit.wikimedia.org/r/975008

Change 975205 had a related patch set uploaded (by Ilias Sarantopoulos; author: Ilias Sarantopoulos):

[operations/deployment-charts@master] ml-services: rollback xgboost/catboost models to kserve 0.10

https://gerrit.wikimedia.org/r/975205

We can proceed with this after https://github.com/catboost/catboost/pull/2519 has been included in a new catboost release (support for CgroupsV2)

Change 976748 had a related patch set uploaded (by Ilias Sarantopoulos; author: Ilias Sarantopoulos):

[operations/deployment-charts@master] ml-services: update docker images to latest versions

https://gerrit.wikimedia.org/r/976748

Change 975205 abandoned by Ilias Sarantopoulos:

[operations/deployment-charts@master] ml-services: rollback xgboost/catboost models to kserve 0.10

Reason:

abandoned in favor of https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/976748 which has the latest updates

https://gerrit.wikimedia.org/r/975205

Change 976748 merged by jenkins-bot:

[operations/deployment-charts@master] ml-services: update docker images to latest versions

https://gerrit.wikimedia.org/r/976748

Deployed the model server so that it has the latest image. It is still running kserve 0.10. Status is the same and we are waiting for a new catboost release.

Change 998946 had a related patch set uploaded (by Ilias Sarantopoulos; author: Ilias Sarantopoulos):

[machinelearning/liftwing/inference-services@main] rrml: upgrade kserve to 0.11.2

https://gerrit.wikimedia.org/r/998946

Since the latest catboost release is still pending we discussed in proceeding without it for now by manually limiting the number of threads. I also asked on GH if there is any release to be out soon.

Change 998946 merged by jenkins-bot:

[machinelearning/liftwing/inference-services@main] rrml: upgrade kserve to 0.11.2

https://gerrit.wikimedia.org/r/998946

Change 999570 had a related patch set uploaded (by Ilias Sarantopoulos; author: Ilias Sarantopoulos):

[operations/deployment-charts@master] ml-services: update multilingual revertrisk image

https://gerrit.wikimedia.org/r/999570

Change 999570 merged by jenkins-bot:

[operations/deployment-charts@master] ml-services: update multilingual revertrisk image

https://gerrit.wikimedia.org/r/999570

Some load test results with wrk on staging: I ran the same ones we previously did.

  • Run for 10s
isaranto@deploy2002:~/inference-services/test/wrk/revertrisk$ wrk -c 1 -t 1 --timeout 5s -s revertrisk.lua https://inference-staging.svc.codfw.wmnet:30443/v1/models/revertrisk-multilingual:predict  --header "Host: revertrisk-multilingual.revertrisk.wikimedia.org" --latency -- revertrisk.input
thread 1 created logfile wrk_1.log created
Running 10s test @ https://inference-staging.svc.codfw.wmnet:30443/v1/models/revertrisk-multilingual:predict
  1 threads and 1 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   399.34ms  215.46ms 802.59ms   70.00%
    Req/Sec     2.55      2.70    10.00     90.91%
  Latency Distribution
     50%  315.68ms
     75%  534.73ms
     90%  802.59ms
     99%  802.59ms
  11 requests in 10.02s, 4.01KB read
  Socket errors: connect 0, read 0, write 0, timeout 1
  Non-2xx or 3xx responses: 1
Requests/sec:      1.10
Transfer/sec:     410.38B
thread 1 made 13 requests and got 11 responses


isaranto@deploy2002:~/inference-services/test/wrk/revertrisk$ wrk -c 2 -t 2 --timeout 5s -s revertrisk.lua https://inference-staging.svc.codfw.wmnet:30443/v1/models/revertrisk-multilingual:predict  --header "Host: revertrisk-multilingual.revertrisk.wikimedia.org" --latency -- revertrisk.input
thread 1 created logfile wrk_1.log created
thread 2 created logfile wrk_2.log created
Running 10s test @ https://inference-staging.svc.codfw.wmnet:30443/v1/models/revertrisk-multilingual:predict
  2 threads and 2 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   509.16ms  244.04ms   1.01s    80.00%
    Req/Sec     2.15      2.16    10.00     90.00%
  Latency Distribution
     50%  587.16ms
     75%  667.84ms
     90%  763.25ms
     99%    1.01s
  20 requests in 10.02s, 7.30KB read
  Non-2xx or 3xx responses: 2
Requests/sec:      2.00
Transfer/sec:     746.58B
thread 1 made 12 requests and got 10 responses
thread 2 made 11 requests and got 10 responses

isaranto@deploy2002:~/inference-services/test/wrk/revertrisk$ wrk -c 4 -t 4 --timeout 5s -s revertrisk.lua https://inference-staging.svc.codfw.wmnet:30443/v1/models/revertrisk-multilingual:predict  --header "Host: revertrisk-multilingual.revertrisk.wikimedia.org" --latency -- revertrisk.input
thread 1 created logfile wrk_1.log created
thread 2 created logfile wrk_2.log created
thread 3 created logfile wrk_3.log created
thread 4 created logfile wrk_4.log created
Running 10s test @ https://inference-staging.svc.codfw.wmnet:30443/v1/models/revertrisk-multilingual:predict
  4 threads and 4 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   884.85ms  392.62ms   1.71s    75.00%
    Req/Sec     1.08      1.94    10.00     90.00%
  Latency Distribution
     50%  938.09ms
     75%    1.18s
     90%    1.42s
     99%    1.71s
  40 requests in 10.02s, 14.58KB read
  Non-2xx or 3xx responses: 5
Requests/sec:      3.99
Transfer/sec:      1.46KB
thread 1 made 12 requests and got 10 responses
thread 2 made 11 requests and got 10 responses
thread 3 made 10 requests and got 9 responses
thread 4 made 12 requests and got 11 responses
  • Run for 1m
isaranto@deploy2002:~/inference-services/test/wrk/revertrisk$ wrk -c 1 -t 1 --timeout 5s -s revertrisk.lua https://inference-staging.svc.codfw.wmnet:30443/v1/models/revertrisk-multilingual:predict  --header "Host: revertrisk-multilingual.revertrisk.wikimedia.org" --latency -d 1m -- revertrisk.input
thread 1 created logfile wrk_1.log created
Running 1m test @ https://inference-staging.svc.codfw.wmnet:30443/v1/models/revertrisk-multilingual:predict
  1 threads and 1 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   783.85ms  820.04ms   4.55s    88.64%
    Req/Sec     2.08      1.71    10.00     73.42%
  Latency Distribution
     50%  492.36ms
     75%  941.72ms
     90%    1.87s
     99%    4.55s
  79 requests in 1.00m, 28.67KB read
  Socket errors: connect 0, read 0, write 0, timeout 1
  Non-2xx or 3xx responses: 1
Requests/sec:      1.31
Transfer/sec:     488.45B
thread 1 made 81 requests and got 79 responses

There aren't significant differences with the old tests results and I consider it is safe to proceed with the kserve upgrade.

isarantopoulos renamed this task from Upgrade Revert Risk Multilingual docker images to KServe 0.11 to Upgrade Revert Risk Multilingual docker images to KServe 0.11.2.Feb 12 2024, 5:05 PM
isarantopoulos closed this task as Resolved.