Page MenuHomePhabricator

[revscoring] Fix Multiprocessing code
Closed, ResolvedPublicBUG REPORT

Description

Steps to replicate the issue (include links if applicable):
A while ago we updated revscoring model servers with kserve (when the switch to fastapi was made) but did not keep the multiprocessing ones up to date.

What happens?:
running a model server with multiprocessing enabled will fail because the Interface of the classes are not compatible (RevscoringMP extends Revscoring class)

What should have happened instead?:
They should not fail :)

Software version (skip for WMF-hosted wikis like Wikipedia):

Other information (browser name/version, screenshots, etc.):

Event Timeline

isarantopoulos renamed this task from [revscoring] Multiprocessing code to [revscoring] Fix Multiprocessing code.Oct 5 2023, 3:27 PM
isarantopoulos claimed this task.
isarantopoulos moved this task from Unsorted to In Progress on the Machine-Learning-Team board.

Change 963754 had a related patch set uploaded (by Ilias Sarantopoulos; author: Ilias Sarantopoulos):

[machinelearning/liftwing/inference-services@main] revscoring: fix mp

https://gerrit.wikimedia.org/r/963754

Change 965125 had a related patch set uploaded (by Ilias Sarantopoulos; author: Ilias Sarantopoulos):

[machinelearning/liftwing/inference-services@main] revscoring: fix mp

https://gerrit.wikimedia.org/r/965125

Change 963754 abandoned by Ilias Sarantopoulos:

[machinelearning/liftwing/inference-services@main] revscoring: fix mp

Reason:

duplicate. Deleting this since I messed up git history

https://gerrit.wikimedia.org/r/963754

Change 965125 merged by Ilias Sarantopoulos:

[machinelearning/liftwing/inference-services@main] revscoring: fix mp

https://gerrit.wikimedia.org/r/965125

Change 965479 had a related patch set uploaded (by Ilias Sarantopoulos; author: Ilias Sarantopoulos):

[operations/deployment-charts@master] ml-services: update revscoring

https://gerrit.wikimedia.org/r/965479

Change 965479 merged by jenkins-bot:

[operations/deployment-charts@master] ml-services: update revscoring and enable articlequality mp

https://gerrit.wikimedia.org/r/965479

Change 965657 had a related patch set uploaded (by Ilias Sarantopoulos; author: Ilias Sarantopoulos):

[operations/deployment-charts@master] ml-services: fix articlequality staging resources

https://gerrit.wikimedia.org/r/965657

Change 965657 merged by jenkins-bot:

[operations/deployment-charts@master] ml-services: fix articlequality staging resources

https://gerrit.wikimedia.org/r/965657

Change 965709 had a related patch set uploaded (by Ilias Sarantopoulos; author: Ilias Sarantopoulos):

[operations/deployment-charts@master] ml-services: enable multiprocessing in enwiki articlequality in staging

https://gerrit.wikimedia.org/r/965709

Change 965709 merged by jenkins-bot:

[operations/deployment-charts@master] ml-services: enable multiprocessing in enwiki articlequality in staging

https://gerrit.wikimedia.org/r/965709

Initially I tried to deploy a second articlequality model for enwiki in staging but this isnt trivial to do at the moment as we craft the response by extracting things from the INFERENCE_NAME which makes it impossible to add a non standard name .e.g enwiki-mp-articlequality is the one I tried.

So I changed the current deployment to use 2 cpus and multiprocessing code.

Some results from running load tests

Single Process

isaranto@deploy2002:~/load_testing$ wrk -c 1 -t 1 --timeout 50s -s revscoring.lua https://inference-staging.svc.codfw.wmnet:30443/v1/models/enwiki-articlequality:predict --latency -d 60 -- articlequality.input
thread 1 created logfile wrk_1.log created
Running 1m test @ https://inference-staging.svc.codfw.wmnet:30443/v1/models/enwiki-articlequality:predict
  1 threads and 1 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     1.04s    53.53ms   1.11s    57.14%
    Req/Sec     0.07      0.26     1.00     92.86%
  Latency Distribution
     50%    1.07s
     75%    1.09s
     90%    1.10s
     99%    1.11s
  56 requests in 1.00m, 25.82KB read
Requests/sec:      0.93
Transfer/sec:     440.65B
thread 1 made 58 requests and got 56 responses

isaranto@deploy2002:~/load_testing$ wrk -c 4 -t 1 --timeout 50s -s revscoring.lua https://inference-staging.svc.codfw.wmnet:30443/v1/models/enwiki-articlequality:predict --latency -d 60 -- articlequality.input
thread 1 created logfile wrk_1.log created
Running 1m test @ https://inference-staging.svc.codfw.wmnet:30443/v1/models/enwiki-articlequality:predict
  1 threads and 4 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     3.85s   877.45ms   6.07s    73.33%
    Req/Sec     0.37      0.49     1.00     63.33%
  Latency Distribution
     50%    3.94s
     75%    4.00s
     90%    4.96s
     99%    6.07s
  60 requests in 1.00m, 27.69KB read
Requests/sec:      1.00
Transfer/sec:     471.74B
thread 1 made 65 requests and got 60 responses

isaranto@deploy2002:~/load_testing$ wrk -c 4 -t 2 --timeout 50s -s revscoring.lua https://inference-staging.svc.codfw.wmnet:30443/v1/models/enwiki-articlequality:predict --latency -d 60 -- articlequality.input
thread 1 created logfile wrk_1.log created
thread 2 created logfile wrk_2.log created
Running 1m test @ https://inference-staging.svc.codfw.wmnet:30443/v1/models/enwiki-articlequality:predict
  2 threads and 4 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     3.85s   824.06ms   6.02s    66.67%
    Req/Sec     0.12      0.32     1.00     88.33%
  Latency Distribution
     50%    3.96s
     75%    4.05s
     90%    4.96s
     99%    6.02s
  60 requests in 1.00m, 27.69KB read
Requests/sec:      1.00
Transfer/sec:     472.47B
thread 1 made 33 requests and got 30 responses
thread 2 made 32 requests and got 30 responses

isaranto@deploy2002:~/load_testing$ wrk -c 20 -t 1 --timeout 50s -s revscoring.lua https://inference-staging.svc.codfw.wmnet:30443/v1/models/enwiki-articlequality:predict --latency -d 60 -- articlequality.input
thread 1 created logfile wrk_1.log created
Running 1m test @ https://inference-staging.svc.codfw.wmnet:30443/v1/models/enwiki-articlequality:predict
  1 threads and 20 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    16.33s     6.07s   30.86s    77.05%
    Req/Sec     0.48      0.50     1.00     52.46%
  Latency Distribution
     50%   18.20s
     75%   19.39s
     90%   21.27s
     99%   30.86s
  61 requests in 1.00m, 28.20KB read
Requests/sec:      1.02
Transfer/sec:     481.21B
thread 1 made 82 requests and got 61 responses


isaranto@deploy2002:~/load_testing$ wrk -c 20 -t 2 --timeout 50s -s revscoring.lua https://inference-staging.svc.codfw.wmnet:30443/v1/models/enwiki-articlequality:predict --latency -d 60 -- articlequality.input
thread 1 created logfile wrk_1.log created
thread 2 created logfile wrk_2.log created
Running 1m test @ https://inference-staging.svc.codfw.wmnet:30443/v1/models/enwiki-articlequality:predict
  2 threads and 20 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    15.80s     6.28s   32.13s    73.77%
    Req/Sec     0.21      0.41     1.00     78.69%
  Latency Distribution
     50%   16.50s
     75%   19.54s
     90%   21.48s
     99%   32.13s
  61 requests in 1.00m, 28.20KB read
Requests/sec:      1.01
Transfer/sec:     480.42B

Multi processing (2 workers/cpus)

isaranto@deploy2002:~/load_testing$ wrk -c 1 -t 1 --timeout 50s -s revscoring.lua https://inference-staging.svc.codfw.wmnet:30443/v1/models/enwiki-articlequality:predict --latency -d 60 -- articlequality.input
thread 1 created logfile wrk_1.log created
Running 1m test @ https://inference-staging.svc.codfw.wmnet:30443/v1/models/enwiki-articlequality:predict
  1 threads and 1 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     1.44s    73.67ms   1.68s    85.00%
    Req/Sec     0.00      0.00     0.00    100.00%
  Latency Distribution
     50%    1.42s
     75%    1.45s
     90%    1.56s
     99%    1.68s
  40 requests in 1.00m, 18.46KB read
Requests/sec:      0.67
Transfer/sec:     314.98B
thread 1 made 42 requests and got 40 responses

isaranto@deploy2002:~/load_testing$ wrk -c 4 -t 1 --timeout 50s -s revscoring.lua https://inference-staging.svc.codfw.wmnet:30443/v1/models/enwiki-articlequality:predict --latency -d 60 -- articlequality.input
thread 1 created logfile wrk_1.log created
Running 1m test @ https://inference-staging.svc.codfw.wmnet:30443/v1/models/enwiki-articlequality:predict
  1 threads and 4 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     2.96s   139.43ms   3.29s    74.68%
    Req/Sec     3.44      3.04    10.00     61.11%
  Latency Distribution
     50%    2.96s
     75%    3.04s
     90%    3.13s
     99%    3.29s
  79 requests in 1.00m, 36.45KB read
Requests/sec:      1.32
Transfer/sec:     622.11B
thread 1 made 84 requests and got 79 responses

isaranto@deploy2002:~/load_testing$ wrk -c 4 -t 2 --timeout 50s -s revscoring.lua https://inference-staging.svc.codfw.wmnet:30443/v1/models/enwiki-articlequality:predict --latency -d 60 -- articlequality.input
thread 1 created logfile wrk_1.log created
thread 2 created logfile wrk_2.log created
Running 1m test @ https://inference-staging.svc.codfw.wmnet:30443/v1/models/enwiki-articlequality:predict
  2 threads and 4 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     2.97s   163.30ms   3.36s    69.62%
    Req/Sec     1.83      2.64    10.00     79.22%
  Latency Distribution
     50%    2.98s
     75%    3.08s
     90%    3.19s
     99%    3.36s
  79 requests in 1.00m, 36.45KB read
Requests/sec:      1.31
Transfer/sec:     621.08B
thread 1 made 42 requests and got 39 responses
thread 2 made 41 requests and got 40 responses

isaranto@deploy2002:~/load_testing$ wrk -c 20 -t 1 --timeout 50s -s revscoring.lua https://inference-staging.svc.codfw.wmnet:30443/v1/models/enwiki-articlequality:predict --latency -d 60 -- articlequality.input
thread 1 created logfile wrk_1.log created
Running 1m test @ https://inference-staging.svc.codfw.wmnet:30443/v1/models/enwiki-articlequality:predict
  1 threads and 20 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    14.55s     1.16s   15.61s    84.42%
    Req/Sec     4.09      1.82    10.00     81.82%
  Latency Distribution
     50%   14.98s
     75%   15.19s
     90%   15.29s
     99%   15.61s
  77 requests in 1.00m, 35.61KB read
Requests/sec:      1.28
Transfer/sec:     607.64B
thread 1 made 98 requests and got 77 responses

isaranto@deploy2002:~/load_testing$ wrk -c 20 -t 2 --timeout 50s -s revscoring.lua https://inference-staging.svc.codfw.wmnet:30443/v1/models/enwiki-articlequality:predict --latency -d 60 -- articlequality.input
thread 1 created logfile wrk_1.log created
thread 2 created logfile wrk_2.log created
Running 1m test @ https://inference-staging.svc.codfw.wmnet:30443/v1/models/enwiki-articlequality:predict
  2 threads and 20 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    14.39s     1.15s   15.28s    84.81%
    Req/Sec     2.16      2.00    10.00     68.83%
  Latency Distribution
     50%   14.82s
     75%   14.98s
     90%   15.12s
     99%   15.28s
  79 requests in 1.00m, 36.53KB read
Requests/sec:      1.31
Transfer/sec:     622.41B
thread 1 made 50 requests and got 39 responses
thread 2 made 50 requests and got 40 responses

From the chaos above the following are noteworthy:

pSingle ProcessMultiprocessconnectionsthreads
50%18.20s14.98s201
75%19.39s15.19s201
90%21.27s15.29s201
99%30.86s15.61s201

This shows that mp with 2 workers can handle these type of requests in a more stable way without returning any non 2xx errors.

I suggest we enable this in production just for enwiki-articlequality and monitor its behavior to see what improvements it brings.

+1, looks great! I would even go further and test it with 4 cores, to see if it improves :)

With 3 cpus and 2 workers we have the following results

isaranto@deploy2002:~/load_testing$ wrk -c 1 -t 1 --timeout 50s -s revscoring.lua https://inference-staging.svc.codfw.wmnet:30443/v1/models/enwiki-articlequality:predict --latency -d 60 -- articlequality.input
thread 1 created logfile wrk_1.log created
Running 1m test @ https://inference-staging.svc.codfw.wmnet:30443/v1/models/enwiki-articlequality:predict
  1 threads and 1 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     1.39s    94.59ms   1.64s    71.43%
    Req/Sec     0.00      0.00     0.00    100.00%
  Latency Distribution
     50%    1.38s
     75%    1.42s
     90%    1.56s
     99%    1.64s
  42 requests in 1.00m, 19.38KB read
Requests/sec:      0.70
Transfer/sec:     330.75B
thread 1 made 44 requests and got 42 responses
isaranto@deploy2002:~/load_testing$ wrk -c 20 -t 1 --timeout 50s -s revscoring.lua https://inference-staging.svc.codfw.wmnet:30443/v1/models/enwiki-articlequality:predict --latency -d 60 -- articlequality.input
thread 1 created logfile wrk_1.log created
Running 1m test @ https://inference-staging.svc.codfw.wmnet:30443/v1/models/enwiki-articlequality:predict
  1 threads and 20 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    13.89s     1.03s   14.73s    85.00%
    Req/Sec     4.46      2.29    10.00     75.68%
  Latency Distribution
     50%   14.29s
     75%   14.41s
     90%   14.49s
     99%   14.73s
  80 requests in 1.00m, 36.99KB read
Requests/sec:      1.33
Transfer/sec:     630.32B
thread 1 made 101 requests and got 80 responses

With 5 cpus and 4 workers

isaranto@deploy2002:~/load_testing$ wrk -c 1 -t 1 --timeout 50s -s revscoring.lua https://inference-staging.svc.codfw.wmnet:30443/v1/models/enwiki-articlequality:predict --latency -d 60 -- articlequality.input
thread 1 created logfile wrk_1.log created
Running 1m test @ https://inference-staging.svc.codfw.wmnet:30443/v1/models/enwiki-articlequality:predict
  1 threads and 1 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     1.43s    53.16ms   1.57s    85.37%
    Req/Sec     0.00      0.00     0.00    100.00%
  Latency Distribution
     50%    1.42s
     75%    1.44s
     90%    1.52s
     99%    1.57s
  41 requests in 1.00m, 18.92KB read
Requests/sec:      0.68
Transfer/sec:     322.34B
thread 1 made 43 requests and got 41 responses

isaranto@deploy2002:~/load_testing$ wrk -c 20 -t 1 --timeout 50s -s revscoring.lua https://inference-staging.svc.codfw.wmnet:30443/v1/models/enwiki-articlequality:predict --latency -d 60 -- articlequality.input
thread 1 created logfile wrk_1.log created
Running 1m test @ https://inference-staging.svc.codfw.wmnet:30443/v1/models/enwiki-articlequality:predict
  1 threads and 20 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     9.21s   839.62ms  10.26s    88.33%
    Req/Sec     4.09      2.23    10.00     86.79%
  Latency Distribution
     50%    9.46s
     75%    9.62s
     90%    9.82s
     99%   10.09s
  120 requests in 1.00m, 55.37KB read
Requests/sec:      2.00
Transfer/sec:      0.92KB
thread 1 made 141 requests and got 120 responses

So the summary results we have for 20 connections are the following

61 requests79 requests120 requests
pSingle ProcessMultiprocess (2 workers)MultiP 4 workers
50%18.20s14.98s9.46s
75%19.39s15.19s9.62s
90%21.27s15.29s9.82s
99%30.86s15.61s10.09s

Change 965933 had a related patch set uploaded (by Ilias Sarantopoulos; author: Ilias Sarantopoulos):

[operations/deployment-charts@master] ml-services: enable multiprocessing for articlequality production

https://gerrit.wikimedia.org/r/965933

Change 965933 merged by jenkins-bot:

[operations/deployment-charts@master] ml-services: enable multiprocessing for articlequality production

https://gerrit.wikimedia.org/r/965933

Change 966142 had a related patch set uploaded (by Ilias Sarantopoulos; author: Ilias Sarantopoulos):

[operations/deployment-charts@master] ml-services: disable mp for inference in articlequality

https://gerrit.wikimedia.org/r/966142

Change 966142 merged by jenkins-bot:

[operations/deployment-charts@master] ml-services: disable mp for inference in articlequality

https://gerrit.wikimedia.org/r/966142