Expand language support for Revert Risk Model
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	MunizaA
	Sep 25 2023, 6:08 PM

Description

Technically this model is language agnostic, but it does require some statistical values for every wiki in order to calculate quality features:

avg article length
avg number of media
avg number of categories
avg number of headings
avg number of wikilinks
avg number of references

This task involves adding these values for new languages and updating the model binary to accurately reflect the total number of supported wikis.

Add quality feature values for 35 new languages to constants.py using this new file created by @diego (See the commit message for more details on how default values were generated for wikis)
Update the supported_wikis attribute on the serialized RevertRiskModel
Bump model version from 1.0 to 2.0
Test the new model binary
Pass it on to the ML team (sha512 checksum for the serialized model: P52800)

Details

Due Date: Sep 28 2023, 7:00 AM

	Subject	Repo	Branch	Lines +/-
	ml-services: update revertrisk-language-agnostic model binary	operations/deployment-charts	master	+2 -2
	revertrisk-la: bump knowledge_integrity version to v0.4.0	machinelearning/liftwing/inference-services	main	+1 -1

Customize query in gerrit

	Title	Reference	Author	Source Branch	Dest Branch
	Expand language support for RevertRiskModel	repos/research/knowledge_integrity!23	mnz	mnz/quality-feature-vals	main

Customize query in GitLab

Related Objects

Mentioned In: T347136: Review Revert Risk reports from WME
rMLISfe4007354ab0: revertrisk-la: bump knowledge_integrity version to v0.4.0
Mentioned Here: P52800 SHA-512 checksum for Revert Risk language agnostic model V2

Event Timeline

MunizaA changed the task status from Open to In Progress.Sep 25 2023, 6:08 PM

MunizaA created this task.

MunizaA set Due Date to Sep 28 2023, 7:00 AM.Sep 25 2023, 6:12 PM

MunizaA moved this task from Backlog to In Progress on the Research board.

MunizaA updated the task description. (Show Details)Sep 26 2023, 2:16 PM

@diego, it's possible I'm missing something but while updating these constants I noticed that the values for be-x-old and be-tarask are different and according to the information here, the former redirects to the latter. be-tarask is one of the new wikis that we're adding these values for, so I wanted to check with you if this is okay. Thanks!

Let's the updated csv for now. Later Iets to coordinate with @fkaelin to periodically update these values, both RRLA and Article quality models.

MunizaA updated the task description. (Show Details)Sep 27 2023, 6:40 PM

mnz opened https://gitlab.wikimedia.org/repos/research/knowledge_integrity/-/merge_requests/23

Expand language support for RevertRiskModel

achou added a project: Machine-Learning-Team.Sep 29 2023, 2:49 PM

achou moved this task from Unsorted to Watching on the Machine-Learning-Team board.

Change 962049 had a related patch set uploaded (by AikoChou; author: AikoChou):

[operations/deployment-charts@master] ml-services: update revertrisk-language-agnostic model binary

https://gerrit.wikimedia.org/r/962049

mnz merged https://gitlab.wikimedia.org/repos/research/knowledge_integrity/-/merge_requests/23

Expand language support for RevertRiskModel

MunizaA updated the task description. (Show Details)Sep 29 2023, 4:24 PM

@achou thanks again for the review! I've released v0.4.0 for Knowledge Integrity which should help you pick up these new changes. Going to close this now but please feel free to reopen if something does not look right!

Change 962066 had a related patch set uploaded (by AikoChou; author: AikoChou):

[machinelearning/liftwing/inference-services@main] revertrisk-la: bump knowledge_integrity version to v0.4.0

https://gerrit.wikimedia.org/r/962066

Change 962066 merged by jenkins-bot:

[machinelearning/liftwing/inference-services@main] revertrisk-la: bump knowledge_integrity version to v0.4.0

https://gerrit.wikimedia.org/r/962066

achou mentioned this in rMLISfe4007354ab0: revertrisk-la: bump knowledge_integrity version to v0.4.0.Oct 2 2023, 7:57 AM

Change 962049 merged by jenkins-bot:

[operations/deployment-charts@master] ml-services: update revertrisk-language-agnostic model binary

https://gerrit.wikimedia.org/r/962049

Maintenance_bot removed a project: Patch-For-Review.Oct 2 2023, 8:30 AM

MunizaA updated the task description. (Show Details)Oct 2 2023, 11:07 AM

Thanks @MunizaA for adding the sha512 checksum for the new model binary in the task description. I have verified it and confirmed the integrity of the file that we uploaded to Swift. In the future, we will do this step before uploading to make sure the file wasn't tampered with or miscopied. :)

@achou @MunizaA thanks a lot! One nit - the paste outlined in the task's description is editable, so in theory anybody can tamper with it (everything is logged but it may be not straightforward to check for ML etc..). I would personally suggest to add the sha512 in a separate phab comment, that is not editable if not by the user (in theory).

Adding the sha512 here:

94ff70cbfac87565b5e04480acd7accd7d0c1f424ebfc2cb858338bc62c309b3745220223489498254010fb772266abd5498e2671eb1cddc138717e663bd3922 *revert_risk_language_agnostic_model_v2.pkl

achou mentioned this in T347136: Review Revert Risk reports from WME.Oct 2 2023, 2:15 PM

calbon moved this task from Watching to 2023-2024 Q3 Done on the Machine-Learning-Team board.Nov 29 2023, 2:17 PM

Expand language support for Revert Risk ModelClosed, ResolvedPublicActions

Description

Details

Related Objects

Event Timeline

Expand language support for Revert Risk Model
Closed, ResolvedPublic
Actions