Page MenuHomePhabricator

Request to host the Reference Need Model on LiftWing
Closed, ResolvedPublic5 Estimated Story Points

Description

  • What use case is the model going to support/resolve?

Enterprise would like to support users who are interested in understanding the level of "safety" of each revision they received with as much granularity as possible.

  • Do you have a model card?

https://meta.wikimedia.org/wiki/Machine_learning_models/Proposed/Multilingual_reference_need

  • What team created/trained/etc.. the model?

Research (@diego and @Aitolkyn)

  • What tools and frameworks have you used?

Mainly the transformers library and pytorch. See the full list of dependencies here.

  • What kind of data was the model trained with?

From the model card:

The model was trained on a set of featured articles, which are determined as the highest quality on Wikipedia by the editors.
We used the mediawiki_wikitext_current table to extract the latest available revision for each featured article. The snapshot used was 2024-02.
Number of languages: 5 ('ru', 'es', 'de', 'fr', 'en')
Number of sentences: 100,000
Random sample of 20,000 sentences from each language balanced on the ground-truth label.

  • What kind of data the model is going to need in production (for example, calls to internal/external services, special datasources for features, etc..) ?

To predict the reference need for a revision, its content i.e. revision text is required.

  • If you have a minimal codebase that you used to run the first tests with the model, could you please share it?

The original source for the model lives in the reference-quality repo. A refactored version of it has since been added to knowledge-integrity and can be used by installing v0.8.2.

  • State what team will own the model and please share some main point of contacts.

Research (@diego)

  • What is the current latency and throughput of the model, if you have tested it?

The latency scales linearly with the number of uncited sentences in the revision text. At the moment, 70% of test articles can be processed under 500ms, while the rest will exceed this time limit.

  • Is there an expected frequency in which the model will have to be retrained with new data? What are the resources required to train the model and what was the dataset size?
  • Have you checked if the output of your model is safe from a human rights point of view? Is there any risk of it being offensive for somebody? Even if you have any slight worry or corner case, please tell us!

@FNavas-foundation to comment?

Event Timeline

MunizaA renamed this task from Request to host Reference Quality Model on Lift Wing to Request to host the Reference Need Model on LiftWing.Aug 26 2024, 12:11 PM
MunizaA updated the task description. (Show Details)
MunizaA added a subscriber: Aitolkyn.

Hi @Aitolkyn, could you provide the location of the model (e.g. directory on stat100x or a google drive link) and its sha512 checksum? You can generate the sha512 using the following command:

  • Example
$ sha512sum -b  model.bin
d1bbf9173091b45a8940f14cd4b3b113374d85c88b5bc5e09c2f6d5676084013cff91a65c1da161c2e970274a0a80b2392b3cc02c40f385602786562fd5a5d3f *model.bin

Hi Aiko! The location on the stat1010 is /home/aitolkyn/temp/reference-quality/pretrained_models/multilingual_reference_need_128_v0.pkl
sha512: 0af0ecd12e05e7c40a0d39dd155589917130d1fa00711c3675c48d4373edca402bdc25cb85a56925deb24ebcf3c0ac01843179c86321f0991772b8963c27ed24 *multilingual_reference_need_128_v0.pkl

Change #1070060 had a related patch set uploaded (by AikoChou; author: AikoChou):

[machinelearning/liftwing/inference-services@main] reference-need: initial commit

https://gerrit.wikimedia.org/r/1070060

Hi @Aitolkyn!
is 1.13.1 the pytorch version that was used during training as shown in the reference-need repo?
If so I was wondering if there is a specific need for this version or we could aim for one of the latest versions in order to be more future proof since 1.13.1 is more than 1.5 yo.
By future proof I mean supporting a newer python version (e.g. python 3.12).

PS. I know such a change would mean making changes to knowledge-integrity repo. Perhaps over there we could relax the requirements to allow for later versions if needed.

cc: @MunizaA

Hi @isarantopoulos, the pytorch version was pinned in knowledge-integrity when the transformers dependency was added. I was under the impression that this was because transformers specifies an upper bound on pytorch and we can't upgrade transformers since its not backward compatible and breaks models trained on older versions. But it looks like there's only a lower bound and so technically we should be able to upgrade to the latest version.

I mean supporting a newer python version (e.g. python 3.12).

Though upgrading pytorch might not be enough for this. There are other knowledge-integrity dependencies that don't ship pre-built wheels for newer versions of python (e.g. scikit-learn for python 3.11) and building them from source has been painful. Upgrading these would require retraining or re-serializing older models.

Hello @isarantopoulos! We downgraded to match the version in the knowledge-integrity repo.

Change #1070060 merged by jenkins-bot:

[machinelearning/liftwing/inference-services@main] reference-need: initial commit

https://gerrit.wikimedia.org/r/1070060

Change #1070941 had a related patch set uploaded (by Kevin Bazira; author: Kevin Bazira):

[machinelearning/liftwing/inference-services@main] Makefile: add support for reference-need

https://gerrit.wikimedia.org/r/1070941

Change #1070941 merged by Ilias Sarantopoulos:

[machinelearning/liftwing/inference-services@main] Makefile: add support for reference-need

https://gerrit.wikimedia.org/r/1070941

Change #1071818 had a related patch set uploaded (by AikoChou; author: AikoChou):

[machinelearning/liftwing/inference-services@main] ci: add blubber for reference-need

https://gerrit.wikimedia.org/r/1071818

Change #1071824 had a related patch set uploaded (by AikoChou; author: AikoChou):

[machinelearning/liftwing/inference-services@main] reference-quality: add CI pipelines to config.yaml

https://gerrit.wikimedia.org/r/1071824

Change #1071825 had a related patch set uploaded (by AikoChou; author: AikoChou):

[integration/config@master] inference-services: add CI jobs for reference-quality

https://gerrit.wikimedia.org/r/1071825

Change #1071818 merged by Ilias Sarantopoulos:

[machinelearning/liftwing/inference-services@main] ci: add blubber for reference-need

https://gerrit.wikimedia.org/r/1071818

Change #1071825 merged by jenkins-bot:

[integration/config@master] inference-services: add CI jobs for reference-quality

https://gerrit.wikimedia.org/r/1071825

Change #1071824 merged by jenkins-bot:

[machinelearning/liftwing/inference-services@main] reference-quality: add CI pipelines to config.yaml

https://gerrit.wikimedia.org/r/1071824

Change #1072193 had a related patch set uploaded (by AikoChou; author: AikoChou):

[operations/deployment-charts@master] admin_ng/LiftWing: add revision-models namespace

https://gerrit.wikimedia.org/r/1072193

Change #1072197 had a related patch set uploaded (by AikoChou; author: AikoChou):

[operations/puppet@production] hiera/deployment-server: create revision-models config/roles

https://gerrit.wikimedia.org/r/1072197

Change #1072197 merged by Klausman:

[operations/puppet@production] hiera/deployment-server: create revision-models config/roles

https://gerrit.wikimedia.org/r/1072197

Change #1072193 merged by jenkins-bot:

[operations/deployment-charts@master] admin_ng/LiftWing: add revision-models namespace

https://gerrit.wikimedia.org/r/1072193

Change #1072252 had a related patch set uploaded (by AikoChou; author: AikoChou):

[operations/deployment-charts@master] ml-services: add ref-quality isvc to experimental ns

https://gerrit.wikimedia.org/r/1072252

Change #1072252 merged by jenkins-bot:

[operations/deployment-charts@master] ml-services: deploy ref-quality isvc in experimental ns

https://gerrit.wikimedia.org/r/1072252

Change #1072541 had a related patch set uploaded (by AikoChou; author: AikoChou):

[machinelearning/liftwing/inference-services@main] locust: entry for reference_quality model

https://gerrit.wikimedia.org/r/1072541

Change #1072541 merged by AikoChou:

[machinelearning/liftwing/inference-services@main] locust: entry for reference_quality models

https://gerrit.wikimedia.org/r/1072541

Change #1073404 had a related patch set uploaded (by AikoChou; author: AikoChou):

[operations/deployment-charts@master] ml-services: deploy ref-quality to prod in new ns

https://gerrit.wikimedia.org/r/1073404

Change #1073404 merged by jenkins-bot:

[operations/deployment-charts@master] ml-services: deploy ref-quality to prod in new ns

https://gerrit.wikimedia.org/r/1073404

Hi, the reference-need model has been deployed to production.

Example usage:

$ curl "https://inference.discovery.wmnet:30443/v1/models/reference-need:predict" -X POST -d '{"rev_id": 123456, "lang": "en"}' -H  "Host: reference-quality.revision-models.wikimedia.org"

We ran load tests in staging, and the results are shown here. The average latency for sample_all.input is 1.129s, while for sample_top_view.input it's 13.352s.

To examine the preprocess and predict time more closely, we can check the kserve isvc dashboard (sample_all.input was used in this test), and we see most of the time is spent on prediction.

The cpu and memory resources for the service are currently set to 4 and 2 Gi.

achou set the point value for this task to 5.Sep 24 2024, 2:40 PM

Change #1077024 had a related patch set uploaded (by AikoChou; author: AikoChou):

[operations/deployment-charts@master] ml-services: update ref-need model and increase cpu and memory

https://gerrit.wikimedia.org/r/1077024

Change #1077024 merged by jenkins-bot:

[operations/deployment-charts@master] ml-services: update ref-need model and increase cpu and memory

https://gerrit.wikimedia.org/r/1077024

A new reference-need model has been deployed to production. This model uses a distilled version of multilingual BERT and dynamic quantization, which improves prediction time. It's now coupled with the reference-risk model under a single service called "reference-quality".

Load testing results:

  • previous model: latency average 1016ms, median 710ms
  • current model: latency average 412ms, median 330ms

Please also see updates in T372405#10233107.

Thanks @achou , and also to @Aitolkyn and @MunizaA , you all did amazing work on making this model faster! The speedup is really impressive and you used cutting edge methods for making this possible. This improvement makes a huge difference from the final user perspective, and specially for the WME use case.

This work is great example on how Research and ML teams can collaborate to achieve what once seemed unfeasible with just our previous knowledge. Once again, thank you all.

Change #1124748 had a related patch set uploaded (by Kevin Bazira; author: Kevin Bazira):

[machinelearning/liftwing/inference-services@main] Makefile: rename reference-need config to reference-quality

https://gerrit.wikimedia.org/r/1124748

Change #1124748 merged by jenkins-bot:

[machinelearning/liftwing/inference-services@main] Makefile: rename reference-need config to reference-quality

https://gerrit.wikimedia.org/r/1124748