Page MenuHomePhabricator

Develop a ML-based service to predict reverts on Wikipedia(s)
Open, In Progress, HighPublic

Description

The Research team in collaboration with the ML-Platform team are creating a new service to help patrollers to detect revisions that might be reverted.

Requirments:

  • One single model for all Wikipedia languages. (use wiki_db as parameter)
  • Model should be primarly language agnostic (Check the subtasks)
  • Model will be able to run for single revisions or batches
  • Model should be able to run in Lift Wing

Please follow the progress of this project on the related tasks.

Event Timeline

Reedy renamed this task from Develop a ML-based service to predict reverts on Wikipedia(s) to Develop a ML-based service to predict reverts on Wikipedia(s).Aug 2 2022, 12:47 PM
diego changed the task status from Open to In Progress.Aug 2 2022, 12:53 PM
diego claimed this task.
diego triaged this task as High priority.
diego added projects: Research, Epic.
diego updated the task description. (Show Details)
diego added subscribers: calbon, AikoChou, MunizaA.
diego added a subscriber: leila.

It has been decided to focus on knowledge integrity risks from two categories of our taxonomy:

  • Content: prevalence and response to vandalism (using data generated from T314384)
  • Community: capacity (shortage of resurces in content moderation | admin burnout), governance (barriers to adminship rights) and demographics (geographical diversity of editors/readers)

It has been decided to focus on knowledge integrity risks from two categories of our taxonomy:

I think this comment shouldn't go on this task.

For the records here a snippet (by @achou) to try the models from the WMF's cluster

Language-Agnostic:

curl "https://inference.svc.codfw.wmnet:30443/v1/models/revert-risk-model:predict" -d @input.json -H "Host: revert-risk-model.experimental.wikimedia.org" --http1.1 -k

Multilingual:

curl "https://inference-staging.svc.codfw.wmnet:30443/v1/models/revert-risk-model:predict" -d @input.json -H "Host: revert-risk-model.experimental.wikimedia.org" --http1.1 -k

An example for input.json: { "lang": "ru", "rev_id": 123855516 }

Updates

  • Discussing the integration of Revert Risk on MediaWiki: T329071

What does the timeline/roadmap look like for getting this model into Liftwing and available as an API? Our team is considering working on a project next year leveraging this model and it would be helpful to know what the timeline would be.
cc @calbon

I've talked with the ML-Platform folks, and we are going to have this API available within this month. @achou is working on this, and will let us know when the public end-point is available

I've talked with the ML-Platform folks, and we are going to have this API available within this month. @achou is working on this, and will let us know when the public end-point is available

Amazing, thanks!

Both models (Language-Agnostic and Multilingual) have been deployed to Lift Wing production. (T332998, T333124) The next step is to work on the public endpoints.

@Samwalton9, will your team be using internal endpoints or public endpoints for the project? Is there any more documentation about this potential project?

Both models (Language-Agnostic and Multilingual) have been deployed to Lift Wing production. (T332998, T333124) The next step is to work on the public endpoints.

@Samwalton9, will your team be using internal endpoints or public endpoints for the project? Is there any more documentation about this potential project?

Thanks for the update @achou! You can find further information on our plans so far at T336934 or https://docs.google.com/presentation/d/1YiF9rfDKoTvoKVdRUYXAXB2jl6oQhn5-55emQ6TQCg8/edit. We're still in the very early phase of planning on this so I don't have a good answer for you about internal vs public just yet.