Page MenuHomePhabricator

Deploy revert-risk-model to production
Open, Needs TriagePublic

Description

The Research team in collaboration with the ML team is working on a language agnostic model to predict reverts on Wikipedia. See T314385.

We'd like to deploy the early versions of the model to LiftWing's experimental namespace. This task serves to track the status of the production deployment.

Event Timeline

Change 849478 had a related patch set uploaded (by AikoChou; author: AikoChou):

[machinelearning/liftwing/inference-services@main] revertrisk: add revertrisk model server and pipeline

https://gerrit.wikimedia.org/r/849478

Change 849480 had a related patch set uploaded (by AikoChou; author: AikoChou):

[integration/config@master] inference-services: add revertrisk pipelines

https://gerrit.wikimedia.org/r/849480

Change 849480 merged by jenkins-bot:

[integration/config@master] inference-services: add revertrisk pipelines

https://gerrit.wikimedia.org/r/849480

Change 849478 merged by jenkins-bot:

[machinelearning/liftwing/inference-services@main] revertrisk: add revertrisk model server and pipeline

https://gerrit.wikimedia.org/r/849478

The model has been uploaded to Thanos Swift:

aikochou@stat1004:~$ s3cmd -c /etc/s3cmd/cfg.d/ml-team.cfg ls s3://wmf-ml-models/experimental/revertrisk/20221026144108/
2022-10-26 14:44       499465  s3://wmf-ml-models/experimental/revertrisk/20221026144108/model.pkl

It was downloaded from knowledge_integrity/pretrained_models

Change 849627 had a related patch set uploaded (by AikoChou; author: AikoChou):

[operations/deployment-charts@master] ml-services: add revert-risk-model isvc

https://gerrit.wikimedia.org/r/849627

Change 849627 merged by Elukey:

[operations/deployment-charts@master] ml-services: add revert-risk-model isvc

https://gerrit.wikimedia.org/r/849627

Change 850408 had a related patch set uploaded (by AikoChou; author: AikoChou):

[machinelearning/liftwing/inference-services@main] revertrisk: allow access to MediaWiki API from internal endpoint

https://gerrit.wikimedia.org/r/850408

Change 850408 merged by jenkins-bot:

[machinelearning/liftwing/inference-services@main] revertrisk: allow access to MediaWiki API from internal endpoint

https://gerrit.wikimedia.org/r/850408

Change 850452 had a related patch set uploaded (by AikoChou; author: AikoChou):

[operations/deployment-charts@master] ml-services: update revert-risk's docker image

https://gerrit.wikimedia.org/r/850452

Change 850452 merged by jenkins-bot:

[operations/deployment-charts@master] ml-services: update revert-risk's docker image

https://gerrit.wikimedia.org/r/850452

The revert-risk model has been deployed to production today. :)

Yeah! Thanks @achou ! Please, can you write here an example of how to hit the endpoint ?

@diego I added a section https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing#Usage in our documentation about how to access inference services internally.

You can change the code example in the doc with the following:

  • url: https://inference.discovery.wmnet:30443/v1/models/revert-risk-model:predict
  • host: revert-risk-model.experimental.wikimedia.org
  • input data: {"lang": "en", "rev_id": 1083325118} (example)

and it should work. Let me know if there is any problem. :)

Some load test results:

  • 1 connection
aikochou@deploy1002:~/rrr$ wrk -c 1 -t 1 --timeout 5s -s inference.lua https://inference.discovery.wmnet:30443/v1/models/revert-risk-model:predict --latency
Running 10s test @ https://inference.discovery.wmnet:30443/v1/models/revert-risk-model:predict
  1 threads and 1 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   123.76ms   29.09ms 339.32ms   93.90%
    Req/Sec     8.89      2.20    10.00     79.01%
  Latency Distribution
     50%  117.50ms
     75%  122.46ms
     90%  141.52ms
     99%  339.32ms
  81 requests in 10.01s, 21.75KB read
Requests/sec:      8.09
Transfer/sec:      2.17KB
  • 3 connections
aikochou@deploy1002:~/rrr$ wrk -c 3 -t 3 --timeout 5s -s inference.lua https://inference.discovery.wmnet:30443/v1/models/revert-risk-model:predict --latency
Running 10s test @ https://inference.discovery.wmnet:30443/v1/models/revert-risk-model:predict
  3 threads and 3 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   129.22ms   18.57ms 316.34ms   85.34%
    Req/Sec     8.54      2.30    10.00     71.43%
  Latency Distribution
     50%  127.24ms
     75%  135.52ms
     90%  144.63ms
     99%  194.68ms
  231 requests in 10.02s, 62.04KB read
Requests/sec:     23.06
Transfer/sec:      6.19KB
  • 5 connections
aikochou@deploy1002:~/rrr$ wrk -c 5 -t 5 --timeout 5s -s inference.lua https://inference.discovery.wmnet:30443/v1/models/revert-risk-model:predict --latency
Running 10s test @ https://inference.discovery.wmnet:30443/v1/models/revert-risk-model:predict
  5 threads and 5 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   139.48ms   24.52ms 410.64ms   87.53%
    Req/Sec     8.02      2.51    10.00     61.56%
  Latency Distribution
     50%  135.93ms
     75%  148.94ms
     90%  161.25ms
     99%  212.92ms
  359 requests in 10.02s, 96.41KB read
Requests/sec:     35.83
Transfer/sec:      9.62KB
  • 10 connections
aikochou@deploy1002:~/rrr$ wrk -c 10 -t 10 --timeout 5s -s inference.lua https://inference.discovery.wmnet:30443/v1/models/revert-risk-model:predict --latency
Running 10s test @ https://inference.discovery.wmnet:30443/v1/models/revert-risk-model:predict
  10 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   173.77ms  156.91ms   1.38s    93.56%
    Req/Sec     8.02      2.67    10.00     63.12%
  Latency Distribution
     50%  134.02ms
     75%  150.65ms
     90%  190.85ms
     99%  993.63ms
  526 requests in 10.02s, 141.26KB read
  Socket errors: connect 2, read 0, write 0, timeout 0
Requests/sec:     52.50
Transfer/sec:     14.10KB
  • 20 connections
aikochou@deploy1002:~/rrr$ wrk -c 20 -t 20 --timeout 5s -s inference.lua https://inference.discovery.wmnet:30443/v1/models/revert-risk-model:predict --latency
Running 10s test @ https://inference.discovery.wmnet:30443/v1/models/revert-risk-model:predict
  20 threads and 20 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   157.03ms   25.39ms 330.88ms   82.81%
    Req/Sec     7.18      2.59    10.00     49.51%
  Latency Distribution
     50%  152.70ms
     75%  165.81ms
     90%  181.38ms
     99%  257.99ms
  1012 requests in 10.02s, 271.78KB read
  Socket errors: connect 4, read 0, write 0, timeout 0
Requests/sec:    101.01
Transfer/sec:     27.13KB

Overall looks very nice compared to revscoring models! The avg lantency remained in 1xx ms, and the RPS increased when we increased number of connections.

Grafana metrics:

Thanks a lot for sharing these results here, @achou! I do see that we're seeing more socket connect errors with increased connections. Is that something we should be concerned about? Wrk docs don't seem to say anything about these errors but some issues on the repo mention that connect errors in particular can also occur when wrk runs out of file descriptors but they also report opening hundreds of connections so not sure if that's the case here.

@MunizaA The reason of the socket connection error seems to be the limited file opening on Linux according to the article. It is not a problem from the model server, so I think we don't need to worry too much. Also note that deploy1002 is not only the deployment server for ML services, but also MediaWiki and all Wikimedia kubernetes services, so wrk may be easier to reach the limit.

The revertrisk model has been deployed to production. I'm going to mark this as RESOLVED. We'll open other tasks for new model deployment when needed.

Summarize the steps:

  1. Add revertrisk model server and pipeline config to the inference-services repository
  2. Add new pipeline to the integration/config repository
  3. Upload model to Thanos Swift
  4. Add revertrisk inference services to the deployment-charts and wait for ML SRE +2 and merge
  5. Deploy to staging (ml-staging-codfw) and test the model
  6. Deploy to production (ml-serve-eqiad &ml-serve-codfw) and test the model (simple curl and/or wrk load test)