Deploy revert-risk multilingual model to production
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	achou
	Dec 14 2022, 5:45 PM

Description

In T323613, we have tested the multilingual revert-risk-model service in ml-sandbox. The next step is to deploy the service to Lift Wing. This task serves to track the status of the production deployment.

Details

Subject	Repo	Branch	Lines +/-
ml-services: update revertrisk images and increase limitranges for ml-eqiad/codfw	operations/deployment-charts	master	+30 -28
ml-services: update revertrisk docker images	operations/deployment-charts	master	+29 -6
revertrisk: upgrade to multilingual revertrisk model	machinelearning/liftwing/inference-services	main	+44 -26

Customize query in gerrit

Related Objects

Mentioned In: rMLIS968b05add297: revertrisk: upgrade to multilingual revertrisk model
Mentioned Here: T321594: Deploy revert-risk-model to production
T329936: Create separate blubberfile and pipeline for revert-risk multilingual model
T325349: Update torch's settings in the Knowledge Integrity repo
T323613: Test MultilingualRevertRiskModel inference service on ml-sandbox

Event Timeline

achou created this task.Dec 14 2022, 5:45 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptDec 14 2022, 5:45 PM

achou added subscribers: calbon, diego, MunizaA, Trokhymovych.Dec 14 2022, 5:46 PM

Change 861434 had a related patch set uploaded (by AikoChou; author: AikoChou):

[machinelearning/liftwing/inference-services@main] revertrisk: upgrade to multilingual revertrisk model

https://gerrit.wikimedia.org/r/861434

gerritbot added a project: Patch-For-Review.Dec 14 2022, 5:47 PM

achou moved this task from Unsorted to In Progress on the Machine-Learning-Team board.Dec 14 2022, 5:50 PM

The model has been uploaded to Thanos Swift:

aikochou@stat1004:~$ s3cmd -c /etc/s3cmd/cfg.d/ml-team.cfg ls s3://wmf-ml-models/experimental/revertrisk/20221214175551/
2022-12-14 18:00   2647804395  s3://wmf-ml-models/experimental/revertrisk/20221214175551/model.pkl

The size is around 2.5G.

Change 861434 merged by jenkins-bot:

[machinelearning/liftwing/inference-services@main] revertrisk: upgrade to multilingual revertrisk model

https://gerrit.wikimedia.org/r/861434

achou mentioned this in rMLIS968b05add297: revertrisk: upgrade to multilingual revertrisk model.Dec 15 2022, 4:04 PM

Maintenance_bot removed a project: Patch-For-Review.Dec 15 2022, 4:30 PM

Change 868442 had a related patch set uploaded (by AikoChou; author: AikoChou):

[operations/deployment-charts@master] ml-services: update revertrisk docker images

https://gerrit.wikimedia.org/r/868442

gerritbot added a project: Patch-For-Review.Dec 15 2022, 4:50 PM

Change 868442 merged by Elukey:

[operations/deployment-charts@master] ml-services: update revertrisk docker images

https://gerrit.wikimedia.org/r/868442

Maintenance_bot removed a project: Patch-For-Review.Dec 16 2022, 4:30 PM

Current status:

Revertisk-multilingual model was successfully deployed to ml-staging yesterday!
Production image tag: 2022-12-22-150637-publish

For the moment, the prod image installed KI from https://gitlab.wikimedia.org/elukey/knowledge_integrity, which removed torch from the dependencies, and installed torch 1.13.1 CPU version in the requiremnts.txt to avoid nvidia/cuda related dependencies. (see T325349)

Next step:

@MunizaA is organizing dependency groups in the knowledge_integrity repository, so there will be a dependency group for lift wing. We'll rebuild images and update new models (work with transformers 4.25.1) when it's ready.

A new model that works with transformers 4.25.1 and torch 1.13.1 is uploaded:
(It is mainly because joblib serialisation specifics. It is needed to reload the model with a new transformers version and reserialize the model dump)

aikochou@stat1004:~$ s3cmd -c /etc/s3cmd/cfg.d/ml-team.cfg ls s3://wmf-ml-models/experimental/revertrisk/20230201095010/
2023-02-01 09:54   2647806802  s3://wmf-ml-models/experimental/revertrisk/20230201095010/model.pkl

achou moved this task from In Progress to Complete Q3 2022/23 on the Machine-Learning-Team board.Feb 7 2023, 2:58 PM

elukey closed this task as Resolved.Feb 7 2023, 3:54 PM

achou renamed this task from Deploy MultilingualRevertRiskModel to production to Deploy revert-risk multilingual model to production.Feb 20 2023, 8:33 AM

achou reopened this task as In Progress.

Current status:

the latest multilingual model was deployed in ml-staging-codfw
working on a separate blubberfile and pipeline for the model, so it no longer shares the pipeline with the revert-risk language-agnostic model. (see T329936)

Next steps:

deploy the latest multilingual model to production
- need to adjust the memory limit range for ml-services, because this isvc needs at least 4 cpu & 6Gi memory
measure the latency

After the task is done, along with T321594 we have two revert-risk isvcs in production, one is the language-agnostic model, and the other is the multilingual model.

achou moved this task from Complete Q3 2022/23 to In Progress on the Machine-Learning-Team board.Feb 20 2023, 8:39 AM

Change 891252 had a related patch set uploaded (by AikoChou; author: AikoChou):

[operations/deployment-charts@master] ml-services: update revertrisk images and increase limitranges for ml-eqiad/codfw

https://gerrit.wikimedia.org/r/891252

gerritbot added a project: Patch-For-Review.Feb 22 2023, 10:56 AM

Change 891252 merged by Elukey:

[operations/deployment-charts@master] ml-services: update revertrisk images and increase limitranges for ml-eqiad/codfw

https://gerrit.wikimedia.org/r/891252

Maintenance_bot removed a project: Patch-For-Review.Feb 22 2023, 3:10 PM

CI pipeline for the revertrisk-multilingual has been added, the production images can be found in:
https://docker-registry.wikimedia.org/wikimedia/machinelearning-liftwing-inference-services-revertrisk-multilingual/tags/

New images (upgrade to debian bullseye and python 3.9) are currently deployed only on ml-staging, in prod there is a complication with limits etc.. that will be solved when we upgrade to k8s 1.23!

Test the model after deployment:

aikochou@deploy1002:~/rrr$ time curl "https://inference-staging.svc.codfw.wmnet:30443/v1/models/revertrisk-multilingual:predict" -d @input.json -H "Host: revertrisk-multilingual-predictor-default.experimental.wikimedia.org" --http1.1
{"lang": "en", "rev_id": 1096086751, "score": {"prediction": false, "probability": {"true": 0.3770119460413965, "false": 0.6229880539586035}}}
real	0m6.514s
user	0m0.010s
sys	0m0.004s

Models deployed to production as well, all good!

elukey moved this task from In Progress to Complete Q3 2022/23 on the Machine-Learning-Team board.Mar 3 2023, 9:16 AM

elukey closed this task as Resolved.Mar 14 2023, 2:59 PM

Deploy revert-risk multilingual model to productionClosed, ResolvedPublicActions

Description

Details

Related Objects

Event Timeline

Deploy revert-risk multilingual model to production
Closed, ResolvedPublic
Actions