Page MenuHomePhabricator

Create a language agnostic model to predict reverts on Wikipedia
Closed, ResolvedPublic

Description

The Research team in collaboration with the ML-Platform team are creating a new service to help patrollers to detect revisions that might be reverted. (Check parent task).

This task is focused in a fully language-agnostic approach.

Event Timeline

diego changed the task status from Open to In Progress.Aug 2 2022, 12:58 PM
diego triaged this task as High priority.

Updates

  • We have developed a language agnostic ML model to predict reverts.
  • The model has an accuracy of 80% on a balanced dataset, compared to the 66% given by ORES.
  • Research code is available here.
  • @MunizaA is working in implementing the code as a service, and then @AikoChou will deploy to LiftWing. T

Updates

  • @MunizaA had written the model to be hosted in Liftwing and shared with @achou.
  • @achou is testing the model locally before uploading to Liftwing
  • @diego is working on creating a new model adding new features.

Updates

  • The code is being refactored by @MunizaA and reviewed by @achou. They are trying to find the optimal architecture in order to make the code easier to maintain and update.
  • We have found a poor performance from the model for anonymous edits. I'm working on updating the model to improve this.

Updates

  • Multilingual and language-agnostic models has been deployed to production. Check the details in the related tasks.
  • We are now onboarding @Sheilakaruku to work on developing an user-interface to work with these models (T318634)
This comment was removed by diego.

For the records here a snippet (by @achou) to try this model:

curl "https://inference.svc.codfw.wmnet:30443/v1/models/revert-risk-model:predict" -d @input.json -H "Host: revert-risk-model.experimental.wikimedia.org" --http1.1 -k

An example for input.json: { "lang": "ru", "rev_id": 123855516 }

Updates

  • We have presented the results of this project at the WMF's "Monthly Tech All Meeting"

Updates

  • We are coordinating with the ML-team to create a public stream with this model's score.

Updates

  • We are discussing how to integrate this model on the Recent Changes page on MediaWiki (T329071)

Updates

  • We are discussing the schema for this (and other) ML-generated events (T331401)

Updates

  • We are coordinating with the ML team to have a public end-point for these models.

For the records here a snippet (by @achou) to try this model:

curl "https://inference.svc.codfw.wmnet:30443/v1/models/revert-risk-model:predict" -d @input.json -H "Host: revert-risk-model.experimental.wikimedia.org" --http1.1 -k

An example for input.json: { "lang": "ru", "rev_id": 123855516 }

@diego Is there an updated URL? I get 404 not found when I attempt to POST to this URL from mwmaint1002. (https://meta.wikimedia.org/wiki/Machine_learning_models/Proposed/Language-agnostic_revert_risk points to T314385#8496547 as the directions for testing out the model.)

Hi @kostajh, there is an updated URL as the model server has been moved to Lift Wing production.

curl "https://inference.discovery.wmnet:30443/v1/models/revertrisk-language-agnostic:predict" -d '{"rev_id": 123456, "lang": "en"}' -H "Host: revertrisk-language-agnostic.revertrisk.wikimedia.org" --http1.1 -k

For more information, please refer to the Lift Wing documentation: https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Usage
Thanks! :)