In T323613, we have tested the multilingual revert-risk-model service in ml-sandbox. The next step is to deploy the service to Lift Wing. This task serves to track the status of the production deployment.
Description
Details
Related Objects
- Mentioned In
- rMLIS968b05add297: revertrisk: upgrade to multilingual revertrisk model
- Mentioned Here
- T321594: Deploy revert-risk-model to production
T329936: Create separate blubberfile and pipeline for revert-risk multilingual model
T325349: Update torch's settings in the Knowledge Integrity repo
T323613: Test MultilingualRevertRiskModel inference service on ml-sandbox
Event Timeline
Change 861434 had a related patch set uploaded (by AikoChou; author: AikoChou):
[machinelearning/liftwing/inference-services@main] revertrisk: upgrade to multilingual revertrisk model
The model has been uploaded to Thanos Swift:
aikochou@stat1004:~$ s3cmd -c /etc/s3cmd/cfg.d/ml-team.cfg ls s3://wmf-ml-models/experimental/revertrisk/20221214175551/ 2022-12-14 18:00 2647804395 s3://wmf-ml-models/experimental/revertrisk/20221214175551/model.pkl
The size is around 2.5G.
Change 861434 merged by jenkins-bot:
[machinelearning/liftwing/inference-services@main] revertrisk: upgrade to multilingual revertrisk model
Change 868442 had a related patch set uploaded (by AikoChou; author: AikoChou):
[operations/deployment-charts@master] ml-services: update revertrisk docker images
Change 868442 merged by Elukey:
[operations/deployment-charts@master] ml-services: update revertrisk docker images
Current status:
Revertisk-multilingual model was successfully deployed to ml-staging yesterday!
Production image tag: 2022-12-22-150637-publish
For the moment, the prod image installed KI from https://gitlab.wikimedia.org/elukey/knowledge_integrity, which removed torch from the dependencies, and installed torch 1.13.1 CPU version in the requiremnts.txt to avoid nvidia/cuda related dependencies. (see T325349)
Next step:
@MunizaA is organizing dependency groups in the knowledge_integrity repository, so there will be a dependency group for lift wing. We'll rebuild images and update new models (work with transformers 4.25.1) when it's ready.
A new model that works with transformers 4.25.1 and torch 1.13.1 is uploaded:
(It is mainly because joblib serialisation specifics. It is needed to reload the model with a new transformers version and reserialize the model dump)
aikochou@stat1004:~$ s3cmd -c /etc/s3cmd/cfg.d/ml-team.cfg ls s3://wmf-ml-models/experimental/revertrisk/20230201095010/ 2023-02-01 09:54 2647806802 s3://wmf-ml-models/experimental/revertrisk/20230201095010/model.pkl
Current status:
- the latest multilingual model was deployed in ml-staging-codfw
- working on a separate blubberfile and pipeline for the model, so it no longer shares the pipeline with the revert-risk language-agnostic model. (see T329936)
Next steps:
- deploy the latest multilingual model to production
- need to adjust the memory limit range for ml-services, because this isvc needs at least 4 cpu & 6Gi memory
- measure the latency
After the task is done, along with T321594 we have two revert-risk isvcs in production, one is the language-agnostic model, and the other is the multilingual model.
Change 891252 had a related patch set uploaded (by AikoChou; author: AikoChou):
[operations/deployment-charts@master] ml-services: update revertrisk images and increase limitranges for ml-eqiad/codfw
Change 891252 merged by Elukey:
[operations/deployment-charts@master] ml-services: update revertrisk images and increase limitranges for ml-eqiad/codfw
CI pipeline for the revertrisk-multilingual has been added, the production images can be found in:
https://docker-registry.wikimedia.org/wikimedia/machinelearning-liftwing-inference-services-revertrisk-multilingual/tags/
New images (upgrade to debian bullseye and python 3.9) are currently deployed only on ml-staging, in prod there is a complication with limits etc.. that will be solved when we upgrade to k8s 1.23!
Test the model after deployment:
aikochou@deploy1002:~/rrr$ time curl "https://inference-staging.svc.codfw.wmnet:30443/v1/models/revertrisk-multilingual:predict" -d @input.json -H "Host: revertrisk-multilingual-predictor-default.experimental.wikimedia.org" --http1.1 {"lang": "en", "rev_id": 1096086751, "score": {"prediction": false, "probability": {"true": 0.3770119460413965, "false": 0.6229880539586035}}} real 0m6.514s user 0m0.010s sys 0m0.004s