Request to update Readability model on Lift Wing
Closed, ResolvedPublic3 Estimated Story Points
Actions

Assigned To

Authored By

	MGerlach
	Jul 10 2024, 12:54 PM

Description

We have improved the readability model a lot (using a ranking instead of classification approach, see TRank vs LMC in the paper's results). Currently, the model on LiftWing still uses the older LMC (classification). If possible, we would like to replace it with the new TRank model.

See below for more details following the instructions for requesting an update.

Which model needs updating?

We would like to update the Readability model: https://api.wikimedia.org/wiki/Lift_Wing_API/Reference/Readability_score_object and https://api.wikimedia.org/wiki/Lift_Wing_API/Reference/Get_readability_prediction

What changes have been made to the model? (e.g., updated training data, different approach, new features, etc.)

The model was trained using a different approach (ranking instead of classification). The FK-score approximation model was changed to a simple linear transformation instead of the nonlinear regression model.

Do any dependent repositories/packages need updates? (e.g., knowledge integrity, sklearn, pytorch, etc.) Please provide the MR/version for reference.

Package readability-liftwing should be updated according to the MR: https://gitlab.wikimedia.org/trokhymovych/readability-liftwing/-/merge_requests/6

Is there a new model binary? What is its version?

Yes. Its version is 4 (the same as mentioned in MR). The new model binary can be found here: https://drive.google.com/file/d/1wsmx5nw2_EtrRlA2RfDXBEiO-SivPYcU/view?usp=sharing .

Does the input/output schema need any changes?

The output schema needs to be changed. The new schema is represented as

@dataclass
class ReadabilityResult:
  score: float
  fk_score_proxy: float

where score stands for the readability score provided by the ranking model (can be used to compare articles between each other), and fk_score_proxy is a Flesch–Kincaid score approximation. The schema is different because the model is now formulated as a ranking task (only assigning a score) instead of a binary prediction task (True/False with probability score for each class).

Does the preprocessing stage require changes?
Does the prediction stage require changes?

The preprocessing and prediction stages have minor changes (due to model change), which are represented in the corresponding MR.

Checklist:

- Update the model card
The model card is updated and can be found here: https://meta.wikimedia.org/wiki/Machine_learning_models/Proposed/Multilingual_readability_model_card
- Provide the location of the new model binary and its sha512 to the ML team.
The new model binary can be found here: https://drive.google.com/file/d/1wsmx5nw2_EtrRlA2RfDXBEiO-SivPYcU/view?usp=sharing .
sha512: d1bbf9173091b45a8940f14cd4b3b113374d85c88b5bc5e09c2f6d5676084013cff91a65c1da161c2e970274a0a80b2392b3cc02c40f385602786562fd5a5d3f

Additional note: The set of supported languages changed slightly because we use a different base model (see model card)

'af', 'sq', 'am', 'ar', 'hy', 'as', 'az', 'eu', 'be', 'bn', 'bs', 'br', 'bg', 'my', 'ca', 'zh-yue', 'zh', 'zh-classical', 'hr', 'cs', 'da', 'nl', 'en', 'eo', 'et', 'tl', 'fi', 'fr', 'gl', 'ka', 'de', 'el', 'gu', 'ha', 'he', 'hi', 'hu', 'is', 'id', 'ga', 'it', 'ja', 'jv', 'kn', 'kk', 'km', 'ko', 'ku', 'ky', 'lo', 'la', 'lv', 'lt', 'mk', 'mg', 'ms', 'ml', 'mr', 'mn', 'ne', 'no', 'or', 'om', 'ps', 'fa', 'pl', 'pt', 'pa', 'ro', 'ru', 'sa', 'gd', 'sr', 'sd', 'si', 'sk', 'sl', 'so', 'es', 'su', 'sw', 'sv', 'ta', 'te', 'th', 'tr', 'uk', 'ur', 'ug', 'uz', 'vi', 'cy', 'fy', 'xh', 'yi', 'simple'

Details

Subject	Repo	Branch	Lines +/-
ml-services: bump memory for readability isvc in prod	operations/deployment-charts	master	+6 -0
locust: entry for readability model	machinelearning/liftwing/inference-services	main	+32 -0
Makefile: update readability model path for local-run	machinelearning/liftwing/inference-services	main	+2 -2
ml-services: update readability model	operations/deployment-charts	master	+3 -3
readability: updates according to the new TRank model	machinelearning/liftwing/inference-services	main	+5 -9

Customize query in gerrit

Related Objects

Mentioned In: rMLIScfa3110c27f1: locust: entry for readability model
rMLIS090095ebbc42: Makefile: update readability model path for local-run
rMLISa68da861cec7: readability: updates according to the new TRank model

Event Timeline

MGerlach created this task.Jul 10 2024, 12:54 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJul 10 2024, 12:54 PM

calbon assigned this task to AikoChou.Jul 30 2024, 2:54 PM

calbon set the point value for this task to 3.

calbon moved this task from Unsorted to Ready To Go on the Machine-Learning-Team board.

achou claimed this task.Jul 31 2024, 1:19 PM

achou added a subscriber: AikoChou.

@Trokhymovych I'm starting to work on this. Is the prediction time similar to the previous model? Or it takes more/less time? Just wanted to get some numbers on how the model performs with expected inputs. Thanks!

Change #1059032 had a related patch set uploaded (by AikoChou; author: AikoChou):

[machinelearning/liftwing/inference-services@main] readability: updates according to the new TRank model

https://gerrit.wikimedia.org/r/1059032

gerritbot added a project: Patch-For-Review.Aug 1 2024, 10:28 AM

Hi @achou! Thanks for working on this. Prediction time should be similar to the previous model. I have checked locally, and it is 2.5s per page on average (with 4s for 95 percentile). However, the model should require more RAM.

Also, I have already merged the corresponding MR.

Change #1059032 merged by jenkins-bot:

[machinelearning/liftwing/inference-services@main] readability: updates according to the new TRank model

https://gerrit.wikimedia.org/r/1059032

achou mentioned this in rMLISa68da861cec7: readability: updates according to the new TRank model.Aug 7 2024, 11:53 AM

Maintenance_bot removed a project: Patch-For-Review.Aug 7 2024, 12:30 PM

Change #1060437 had a related patch set uploaded (by AikoChou; author: AikoChou):

[operations/deployment-charts@master] ml-services: update readability model

https://gerrit.wikimedia.org/r/1060437

gerritbot added a project: Patch-For-Review.Aug 7 2024, 1:28 PM

Change #1060437 merged by jenkins-bot:

[operations/deployment-charts@master] ml-services: update readability model

https://gerrit.wikimedia.org/r/1060437

Maintenance_bot removed a project: Patch-For-Review.Aug 7 2024, 4:30 PM

Change #1061948 had a related patch set uploaded (by Kevin Bazira; author: Kevin Bazira):

[machinelearning/liftwing/inference-services@main] Makefile: update readability model path for local-run

https://gerrit.wikimedia.org/r/1061948

gerritbot added a project: Patch-For-Review.Aug 12 2024, 8:14 AM

Change #1061948 merged by Kevin Bazira:

[machinelearning/liftwing/inference-services@main] Makefile: update readability model path for local-run

https://gerrit.wikimedia.org/r/1061948

kevinbazira mentioned this in rMLIS090095ebbc42: Makefile: update readability model path for local-run.Aug 12 2024, 8:51 AM

Maintenance_bot removed a project: Patch-For-Review.Aug 12 2024, 9:30 AM

isarantopoulos triaged this task as Medium priority.Aug 13 2024, 2:02 PM

Change #1062680 had a related patch set uploaded (by AikoChou; author: AikoChou):

[machinelearning/liftwing/inference-services@main] locust: entry for readability model

https://gerrit.wikimedia.org/r/1062680

gerritbot added a project: Patch-For-Review.Aug 14 2024, 9:10 AM

Change #1062680 merged by Ilias Sarantopoulos:

[machinelearning/liftwing/inference-services@main] locust: entry for readability model

https://gerrit.wikimedia.org/r/1062680

achou mentioned this in rMLIScfa3110c27f1: locust: entry for readability model.Aug 16 2024, 7:50 AM

Maintenance_bot removed a project: Patch-For-Review.Aug 16 2024, 8:30 AM

We've deployed the model to ml-staging. Initially, the service was crashlooping due to out of memory. The issue was resolved after increasing the memory to 4Gi (the patch). Mykola mentioned that the new model would require more RAM.

However, we observed higher latency compared to the old model during load tests. The old model is 4.5s on average and 0.27 req/s, while the new model is 8.7s on average and 0.17 req/s. Here is the load test results, performed on this input data.

From the dashboard for latency, the old model's numbers vary a lot between pages, whereas the new model takes a similar prediction time per page.

@Trokhymovych, could you run the same test you reported in T369712#10038210 with the previous model? Is the prediction time similar to the result of the new model?

Hi @achou, thanks so much for your work! I’ve run the tests and can confirm the scale of your observations. The old model averages 1.07s per item, while the new model averages 2.52s per item on the same data, meaning the new model is indeed about twice as slow. (Absolute numbers may vary depending on CPU and connection speed.) My initial assumption that their performance was "similar" was incorrect. I hope this information is helpful.

@Trokhymovych thanks for clarifying it. That's super helpful! :)

Change #1064391 had a related patch set uploaded (by AikoChou; author: AikoChou):

[operations/deployment-charts@master] ml-services: bump memory for readability isvc in prod

https://gerrit.wikimedia.org/r/1064391

gerritbot added a project: Patch-For-Review.Aug 21 2024, 2:45 PM

Change #1064391 merged by jenkins-bot:

[operations/deployment-charts@master] ml-services: bump memory for readability isvc in prod

https://gerrit.wikimedia.org/r/1064391

The new model has been deployed to production.

$ curl https://api.wikimedia.org/service/lw/inference/v1/models/readability:predict -X POST -d '{"rev_id": 123456, "lang": "en"}' -H "Content-type: application/json" | jq '.'
{
  "model_name": "readability",
  "model_version": "4",
  "wiki_db": "enwiki",
  "revision_id": 123456,
  "output": {
    "score": -0.29161882400512695,
    "fk_score_proxy": 8.63213539862886
  }
}

Also the API docs has been updated: https://api.wikimedia.org/wiki/Lift_Wing_API/Reference/Readability_score_object and https://api.wikimedia.org/wiki/Lift_Wing_API/Reference/Get_readability_prediction

@Trokhymovych Let me know if you have any questions. :)

achou moved this task from Ready To Go to In Progress on the Machine-Learning-Team board.Aug 23 2024, 4:03 PM

achou closed this task as Resolved.Sep 17 2024, 3:00 PM

Maintenance_bot removed a project: Patch-For-Review.Sep 17 2024, 3:31 PM

isarantopoulos moved this task from In Progress to 2024-2025 Q2 Done on the Machine-Learning-Team board.Sep 24 2024, 2:04 PM

isarantopoulos moved this task from 2024-2025 Q2 Done to Task Archive on the Machine-Learning-Team board.Mon, Nov 4, 8:06 AM