Deploy multilingual readability model to LiftWing
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	MGerlach
	Apr 6 2023, 8:31 AM

Description

We developed a multilingual model for readability. This model generates a score for Wikipedia articles capturing (some aspect) of how easy it is to read. For more details see: https://meta.wikimedia.org/wiki/Research:Multilingual_Readability_Research#An_improved_multilingual_model_for_readability

At the moment, the model lives on one of the stat-machines. The goal is to make the model's output available via Lifting.

Details

Subject	Repo	Branch	Lines +/-
SLOs: Add SLO for Liftwing Readability isvc	operations/grafana-grizzly	master	+22 -0
APIGW: add entry for multilingual readability LW isvc	operations/deployment-charts	master	+14 -0
profile::k8s::deployment_server: Add config for readability isvc	operations/puppet	production	+8 -0
helmfile.d: Add config bits to move readability isvc to prod	operations/deployment-charts	master	+131 -11
ml-services: update readability docker image	operations/deployment-charts	master	+1 -1
readability: add nltk tokenizers download to blubber's builder	machinelearning/liftwing/inference-services	main	+5 -2
ml-services: increase memory resources for readability isvc	operations/deployment-charts	master	+7 -1
ml-services: add readability isvc to experimental ns	operations/deployment-charts	master	+11 -0
readability: add readability model server	machinelearning/liftwing/inference-services	main	+220 -0
inference-services: add readability pipelines	integration/config	master	+15 -0

Customize query in gerrit

Related Objects
Search...

		Status	Subtype	Assigned	Task
		Open		MGerlach	T293028 [EPIC] Initiate Multilingual Readability Research
		Resolved		klausman	T334182 Deploy multilingual readability model to LiftWing

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptApr 6 2023, 8:31 AM

weekly update:

setting up documentation of the trained model.
- we wrote up the summary of the results in a doc. we will be moving those to the project's meta-page.
- we will be adding the code to the repo with an example notebook for predicting scores

leila added a parent task: T293028: [EPIC] Initiate Multilingual Readability Research.Apr 6 2023, 3:39 PM

diego subscribed.Apr 12 2023, 10:11 AM

weekly update:

started initial conversation with Diego (and Muniza and Aiko). since the model is relying on the same pipeline as the revert-risk model, the conclusion was that it should be possible to move it to LiftWing, in prinicple

weekly update:

no update

weekly update:

no update

weekly update:

created a repository with the code required for liftwing https://gitlab.wikimedia.org/trokhymovych/readability-liftwing/-/tree/main

achou subscribed.May 16 2023, 3:27 PM

weekly update

ongoing discussions between Mykola and Aiko/Muniza; getting feedback on repo
next step: drafting model card

weekly:

Mykola created first draft of the model card. will review and make suggestions/improvements if needed

achou added a project: Machine-Learning-Team.Jun 21 2023, 6:33 PM

Change 931987 had a related patch set uploaded (by AikoChou; author: AikoChou):

[machinelearning/liftwing/inference-services@main] readability: add readability model server

https://gerrit.wikimedia.org/r/931987

gerritbot added a project: Patch-For-Review.Jun 21 2023, 6:39 PM

Change 931994 had a related patch set uploaded (by AikoChou; author: AikoChou):

[integration/config@master] inference-services: add readability pipelines

https://gerrit.wikimedia.org/r/931994

achou moved this task from Unsorted to In Progress on the Machine-Learning-Team board.Jun 22 2023, 12:07 PM

weekly update:

working on improving the draft for the model card https://meta.wikimedia.org/wiki/User:Trokhymovych/drafts/Multilingual_readability_model_card
glad to see that ML Team has moved this task to in-progress : )

MGerlach updated the task description. (Show Details)Jun 23 2023, 1:36 PM

Change 931994 merged by jenkins-bot:

[integration/config@master] inference-services: add readability pipelines

https://gerrit.wikimedia.org/r/931994

Change 931987 merged by Ilias Sarantopoulos:

[machinelearning/liftwing/inference-services@main] readability: add readability model server

https://gerrit.wikimedia.org/r/931987

isarantopoulos mentioned this in rMLISf860c291de2c: readability: add readability model server.Jun 29 2023, 1:54 PM

Maintenance_bot removed a project: Patch-For-Review.Jun 29 2023, 2:11 PM

Change 934562 had a related patch set uploaded (by AikoChou; author: AikoChou):

[operations/deployment-charts@master] ml-services: add readability isvc to experimental ns

https://gerrit.wikimedia.org/r/934562

gerritbot added a project: Patch-For-Review.Jun 30 2023, 3:28 PM

Change 934562 merged by jenkins-bot:

[operations/deployment-charts@master] ml-services: add readability isvc to experimental ns

https://gerrit.wikimedia.org/r/934562

Maintenance_bot removed a project: Patch-For-Review.Jun 30 2023, 4:10 PM

Change 934582 had a related patch set uploaded (by AikoChou; author: AikoChou):

[operations/deployment-charts@master] ml-services: increase memory resources for readability isvc

https://gerrit.wikimedia.org/r/934582

gerritbot added a project: Patch-For-Review.Jun 30 2023, 4:25 PM

Change 934582 abandoned by AikoChou:

[operations/deployment-charts@master] ml-services: increase memory resources for readability isvc

Reason:

no needed

https://gerrit.wikimedia.org/r/934582

Maintenance_bot removed a project: Patch-For-Review.Jul 3 2023, 10:30 AM

Change 935068 had a related patch set uploaded (by AikoChou; author: AikoChou):

[machinelearning/liftwing/inference-services@main] readability: add nltk tokenizers download to blubber's builder

https://gerrit.wikimedia.org/r/935068

gerritbot added a project: Patch-For-Review.Jul 3 2023, 1:51 PM

Change 935068 merged by jenkins-bot:

[machinelearning/liftwing/inference-services@main] readability: add nltk tokenizers download to blubber's builder

https://gerrit.wikimedia.org/r/935068

isarantopoulos mentioned this in rMLISc67160785e27: readability: add nltk tokenizers download to blubber's builder.Jul 5 2023, 8:40 AM

Change 935676 had a related patch set uploaded (by AikoChou; author: AikoChou):

[operations/deployment-charts@master] ml-services: update readability docker image

https://gerrit.wikimedia.org/r/935676

Change 935676 merged by jenkins-bot:

[operations/deployment-charts@master] ml-services: update readability docker image

https://gerrit.wikimedia.org/r/935676

Maintenance_bot removed a project: Patch-For-Review.Jul 5 2023, 11:10 AM

The readability model has been deployed to LiftWing staging. It is available via an internal endpoint.

Test the model:

aikochou@deploy1002:~$ time curl "https://inference-staging.svc.codfw.wmnet:30443/v1/models/readability:predict" -X POST -d '{"lang": "en", "rev_id": 1161100049}' -H "Host: readability.experimental.wikimedia.org" --http1.1

{"model_name":"readability","model_version":"2","wiki_db":"enwiki","revision_id":1161100049,"output":{"prediction":true,"probabilities":{"true":0.8169194640857833,"false":0.1830805359142167},"fk_score":11.953445079550391}}
real	0m1.361s
user	0m0.014s
sys	0m0.001s

@achou this is great. I tried from the stat1008 and can confirm that this works.
Would it be possible to make it available publicly? I would like to access the endpoint from toolforge for a public API.
Thanks

In T334182#9003995, @MGerlach wrote:

@achou this is great. I tried from the stat1008 and can confirm that this works.
Would it be possible to make it available publicly? I would like to access the endpoint from toolforge for a public API.
Thanks

Hi @MGerlach! We follow https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing#Hosting_stages_for_a_model_server_on_Lift_Wing to graduate a model to production, so we can start working on it if Research has the bandwidth to meet all the criteria (especially the ownership etc..). Let us know :)

MGerlach moved this task from FY2022-23-Research-April-June to In Progress on the Research board.Jul 13 2023, 12:36 PM

MGerlach edited projects, added Research; removed Research (FY2022-23-Research-April-June).

In T334182#9004000, @elukey wrote:

Hi @MGerlach! We follow https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing#Hosting_stages_for_a_model_server_on_Lift_Wing to graduate a model to production, so we can start working on it if Research has the bandwidth to meet all the criteria (especially the ownership etc..). Let us know :)

@elukey thanks for the pointer.
We created a model card for the model which describes in detail the evaluation and specifies a point of contact (me): https://meta.wikimedia.org/wiki/Machine_learning_models/Proposed/Multilingual_readability_model_card
Do you need any additional information or commitments from our side? I am unsure about some of the other requirements specified in the docs such as the stability level, code quality, etc. Any guidance on how to make sure we can (help to) ensure that we meet those would be very helpful : )

In T334182#9012502, @MGerlach wrote:

In T334182#9004000, @elukey wrote:

Hi @MGerlach! We follow https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing#Hosting_stages_for_a_model_server_on_Lift_Wing to graduate a model to production, so we can start working on it if Research has the bandwidth to meet all the criteria (especially the ownership etc..). Let us know :)

@elukey thanks for the pointer.
We created a model card for the model which describes in detail the evaluation and specifies a point of contact (me): https://meta.wikimedia.org/wiki/Machine_learning_models/Proposed/Multilingual_readability_model_card
Do you need any additional information or commitments from our side? I am unsure about some of the other requirements specified in the docs such as the stability level, code quality, etc. Any guidance on how to make sure we can (help to) ensure that we meet those would be very helpful : )

Sure! What we are looking for is an indication of the commitment of the team (requesting the model server) to support the model in the long term. For example, say that this model server leads to bugs, HTTP 500s, etc. and the issue is in the model itself. we (as ML) would need some support from the Research team (even if you are away etc..) to figure out what's wrong. I had a chat with @leila about this, and the idea was to have a limited list of models to publish in order to be able to better support them. From my point of view we are ready for it, but we'd need a sign off from your team first :)

MGerlach renamed this task from (stretch) Deploy multilingual readability model to LiftWing to Deploy multilingual readability model to LiftWing.Jul 14 2023, 8:57 AM

MGerlach moved this task from In Progress to FY2023-24-Research-July-September on the Research board.

MGerlach edited projects, added Research (FY2023-24-Research-July-September); removed Research.

fkaelin subscribed.Jul 19 2023, 2:05 PM

@elukey thanks for the additional context.
there are ongoing discussions in the Research Team around the level of commitment we can provide and also sustain in the long-run. At the example of this specific task, we have been starting to think how to answer this question more generally for other potential models in the future as well. I would like to wait for these discussions to take place in the next week or so and then will get back here when I have a clearer picture.

fkaelin mentioned this in T342916: Add new "Readability" gap to Knowledge Gaps pipeline.Jul 27 2023, 6:00 PM

weekly update:

no update

weekly update:

no update

@elukey Research accepts accountability for the readability model for a period of 12 months (We will revisit then if we want to continue being accountable. If yes, we renew. If no, we let you know and you can stop the model.). Accountability means that we will assure on our end there is always someone who can pick up the work related to updating the model as prioritized, and that we triage incoming tasks relevant to our team. If this works for you, we're good to go.

We will continue working on our end and with your team to further clarify accountability details. :)

weekly update:

coordinating next steps with folks from ML team. my understanding is that work will be picked up by them in the next week (thanks!)

Change 951460 had a related patch set uploaded (by Klausman; author: Klausman):

[operations/puppet@production] profile::k8s::deployment_server: Add config for readability isvc

https://gerrit.wikimedia.org/r/951460

Change 951461 had a related patch set uploaded (by Klausman; author: Klausman):

[operations/deployment-charts@master] helmfile.d: Add config bits to move readability isvc to prod

https://gerrit.wikimedia.org/r/951461

weekly update:

ML Team started work on moving model to production in liftwing (see above)
Revising model binary to accurately capture supported wikis https://gitlab.wikimedia.org/trokhymovych/readability-liftwing/-/blob/fe3609eac953aa9a2bb614dbb6307bf0d52c1199/readability/binary_setup.py

In T334182#9070479, @leila wrote:

@elukey Research accepts accountability for the readability model for a period of 12 months (We will revisit then if we want to continue being accountable. If yes, we renew. If no, we let you know and you can stop the model.). Accountability means that we will assure on our end there is always someone who can pick up the work related to updating the model as prioritized, and that we triage incoming tasks relevant to our team. If this works for you, we're good to go.

We will continue working on our end and with your team to further clarify accountability details. :)

Hi @leila! This is great, but it has a big downside - if we expose readability models to the outside community (via the API Gateway) we'll have clients that (rightfully) will base their jobs/bots/dashboards/etc.. on them, and the 12 months review time worries me a bit. We are doing a lot of work to migrate the community using the ORES API to Lift Wing, experiencing how difficult it is to ask a wide variety of projects to change/adapt their code. I am not saying that the 12 months timeline is not good, but we should also think about deprecation paths if we decide to remove a model from production after some time (so some extra work/help from Research may be needed to move the community to other models etc..). Lemme know your thoughts :)

@MGerlach I added a step in https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing#Hosting_stages_for_a_model_server_on_Lift_Wing, namely:

A basic load test is performed to figure out (indicatively) how many rps the model server can sustain (in staging). The ML team and the model owner set a target SLO for the service.

The load test part is a simple test to figure out, varying the inputs, how the model server behaves in staging (namely, how many rps-es can sustain without slowing down etc..). We don't have any clear docs about it, but I'll try to create some to help out (and I'll publish results in here).

The SLO part is newer and more difficult, we can try to discuss it in a meeting if you prefer, but it is essentially the level of availability that we want to set for the model server. Maybe we can discuss these points during the next research/ml sync?

In T334182#9130664, @elukey wrote:

@MGerlach I added a step in https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing#Hosting_stages_for_a_model_server_on_Lift_Wing, namely:

A basic load test is performed to figure out (indicatively) how many rps the model server can sustain (in staging). The ML team and the model owner set a target SLO for the service.

The load test part is a simple test to figure out, varying the inputs, how the model server behaves in staging (namely, how many rps-es can sustain without slowing down etc..). We don't have any clear docs about it, but I'll try to create some to help out (and I'll publish results in here).

The SLO part is newer and more difficult, we can try to discuss it in a meeting if you prefer, but it is essentially the level of availability that we want to set for the model server. Maybe we can discuss these points during the next research/ml sync?

@elukey thanks. let me know how I can help with the load test. I am happy to discuss the point about SLO and will try to attend the ML/Research meeting later today.

weekly update:

met with Luca and Aiko to discuss load testing and SLO T334182#9130664
we agreed that it makes sense to address these questions before deploying publicly
we will figure out details jointly along the way; ML Team will lead this investigation with input from Research

weekly update:

no update

I conducted some load tests on the readability model in staging using the same input and script as we did for revert-risk (code), as they share the same input parameters. The results can be found here: P52406.

The model performs similarly to the revertrisk-multilingual model with high latency, as both use pre-trained mBERT. Therefore, we may need to set a low threshold for the autoscaling.knative.dev/target, e.g. 3 or 4.

Great results @achou!

@MGerlach before proceeding, do you have any plan for the model? I mean, are there any known consumers/clients that we'll use it or is it just a new endpoint to test? I am asking since the traffic handled per second seems moderate, so we'd need to refine/improve it a little if any client/consumer has higher performance demands. If not we can proceed and move the service to production, let us know!

In T334182#9160300, @elukey wrote:

@MGerlach before proceeding, do you have any plan for the model? I mean, are there any known consumers/clients that we'll use it or is it just a new endpoint to test? I am asking since the traffic handled per second seems moderate, so we'd need to refine/improve it a little if any client/consumer has higher performance demands. If not we can proceed and move the service to production, let us know!

There are no known external consumers at the moment since it is a new endpoint to test. Currently, the model output will be used to i) provide a metric for the knowledge gaps indexand ii) provide the scores for the readability-tool on toolforge. Especially the latter we are planning to advertise more once this is live so in the future the demand could rise (but I assume we could then adapt depending on what happens). Let me know if I should provide more details. Thanks.

Thanks for the info @MGerlach!

In my opinion we are ok to proceed. @klausman @achou (if you agree to proceed as well) - when you have time could you please coordinate and move readability to Prod?

Change 951461 merged by jenkins-bot:

[operations/deployment-charts@master] helmfile.d: Add config bits to move readability isvc to prod

https://gerrit.wikimedia.org/r/951461

Change 951460 merged by Klausman:

[operations/puppet@production] profile::k8s::deployment_server: Add config for readability isvc

https://gerrit.wikimedia.org/r/951460

Maintenance_bot removed a project: Patch-For-Review.Sep 13 2023, 9:30 AM

The service has been moved from the experimental namespace to readability in staging-codfw, and newly deployed to the same namespace to serve-codfw and -eqiad.

Queries work as expected:

$ curl -s "https://inference.svc.codfw.wmnet:30443/v1/models/readability:predict" -H Host:\ readability.readability.wikimedia.org -X POST -d @input-readability-1.json|jq .

{
  "model_name": "readability",
  "model_version": "2",
  "wiki_db": "enwiki",
  "revision_id": "123456",
  "output": {
    "prediction": false,
    "probabilities": {
      "true": 0.4793056845664978,
      "false": 0.5206943154335022
    },
    "fk_score": 8.277095534538086
  }
}

In T334182#9122433, @elukey wrote:

In T334182#9070479, @leila wrote:

@elukey Research accepts accountability for the readability model for a period of 12 months (We will revisit then if we want to continue being accountable. If yes, we renew. If no, we let you know and you can stop the model.). Accountability means that we will assure on our end there is always someone who can pick up the work related to updating the model as prioritized, and that we triage incoming tasks relevant to our team. If this works for you, we're good to go.

We will continue working on our end and with your team to further clarify accountability details. :)

Hi @leila! This is great, but it has a big downside - if we expose readability models to the outside community (via the API Gateway) we'll have clients that (rightfully) will base their jobs/bots/dashboards/etc.. on them, and the 12 months review time worries me a bit. We are doing a lot of work to migrate the community using the ORES API to Lift Wing, experiencing how difficult it is to ask a wide variety of projects to change/adapt their code. I am not saying that the 12 months timeline is not good, but we should also think about deprecation paths if we decide to remove a model from production after some time (so some extra work/help from Research may be needed to move the community to other models etc..). Lemme know your thoughts :)

@elukey (to capture our sync conversation results w.r.t. your comment above):

We understand that you/we may not able to just take models out of Production as models will have users (internal/external). At this moment, if we are asking you to bring an existing model that Research has developed out of Production, I commit that our team works with you to make sure the transition is done gracefully.
We discussed the need for having a "research/experimental" environment for models that we don't want to take the time of your team to maintain in Production but we still want to expose to specifics user groups and in-line with your team's ask that we start working with your team early on when we start developing models, even when they are experimental. You had multiple good suggestions and clarifications on this front. I'll continue those conversations with you outside of this task.

Thanks to you and the team for your continued collaboration.

The service is currently deployed to production! It is only available for internal clients.

Next steps:

Publish the service via api.wikimedia.org (API Gateway).
Add basic documentation to the API Portal.

@klausman Assigned the task to you since there are a couple of steps that are more related to SRE (lemme know if you don't have time, I'll take care of it).

Change 959684 had a related patch set uploaded (by Klausman; author: Klausman):

[operations/deployment-charts@master] APIGW: add entry for multilingual readability LW isvc

https://gerrit.wikimedia.org/r/959684

gerritbot added a project: Patch-For-Review.Sep 21 2023, 9:05 AM

Change 959684 merged by jenkins-bot:

[operations/deployment-charts@master] APIGW: add entry for multilingual readability LW isvc

https://gerrit.wikimedia.org/r/959684

Maintenance_bot removed a project: Patch-For-Review.Sep 26 2023, 10:10 AM

Change 961701 had a related patch set uploaded (by Klausman; author: Klausman):

[operations/grafana-grizzly@master] SLOs: Add SLO for Liftwing Readability isvc

https://gerrit.wikimedia.org/r/961701

gerritbot added a project: Patch-For-Review.Sep 28 2023, 9:01 AM

API documentation has been added to the API Portal:

https://api.wikimedia.org/wiki/Lift_Wing_API/Reference/Get_readability_prediction :)

klausman moved this task from In Progress to Complete Q3 2022/23 on the Machine-Learning-Team board.Sep 29 2023, 2:57 PM

Change 961701 merged by Klausman:

[operations/grafana-grizzly@master] SLOs: Add SLO for Liftwing Readability isvc

https://gerrit.wikimedia.org/r/961701

SLO dashboard now available at: https://grafana-rw.wikimedia.org/d/slo-Lift_Wing_Readability/lift-wing-readability-slo-s?orgId=1

Maintenance_bot removed a project: Patch-For-Review.Oct 2 2023, 9:10 AM

@MGerlach we are done! Let us know if we are good or if anything is missing :)

leila moved this task from FY2023-24-Research-July-September to Needs Sign-off on the Research board.Oct 11 2023, 12:21 PM

leila edited projects, added Research; removed Research (FY2023-24-Research-July-September).

In T334182#9224301, @elukey wrote:

@MGerlach we are done! Let us know if we are good or if anything is missing :)

this is great news. I had a look and all seems to be working as expected. thanks to everyone who contributed to make this happen (@elukey @achou @klausman )

from my side the task is resolved. any other improvements or changes will be captured in follow-up tasks.

MGerlach mentioned this in T347697: Drafting a paper about the multilingual readability model.Oct 27 2023, 8:41 PM

Deploy multilingual readability model to LiftWingClosed, ResolvedPublicActions

Description

Details

Related ObjectsSearch...

Event Timeline

Deploy multilingual readability model to LiftWing
Closed, ResolvedPublic
Actions

Related Objects
Search...