Page MenuHomePhabricator

What is the distribution of Revert Risk scores for Content Translation edits?
Open, In Progress, MediumPublic

Description

What team/program is this request for?
Moderator-Tools-Team, Automoderator

What are you requesting?
Analysis of Revert Risk scores for edits tagged with Content Translation (CX). We want to understand whether CX edits should be considered by Automoderator or not - if they have a high likelihood of false positives then we could skip them.

It would be good to understand the distribution (e.g. median) of the scores, and also get some example edits with scores over 0.98.

What is the problem you're trying to solve?
Improving the false positive rate of Automoderator.

What decision will you make or action will you take with the deliverable?
Whether to skip CX edits or not.

Details

Other Assignee
KCVelaga_WMF
TitleReferenceAuthorSource BranchDest Branch
T358128: Create Revert Risk Scores for Content Translationkcvelaga/moderator-tools-FY24!2jebeT358128-Distribution-of-Revert-Risk-scores-for-Content-Translation-editsmain
Customize query in GitLab

Event Timeline

KCVelaga_WMF triaged this task as Medium priority.
KCVelaga_WMF updated Other Assignee, added: KCVelaga_WMF.
KCVelaga_WMF subscribed.

@JEbe-WMF

The revision tags are: contenttranslation or contenttranslation-v2

In the related analysis for extended confirmed I used the scores for 2022, as that was the latest available at that point. However, the scores are updated monthly now and are available at risk_observatory.revert_risk_predictions in the cluster.

Also, the the analysis can be limited to the past six months only, instead of an entire year.

KCVelaga_WMF changed the task status from Open to In Progress.Jun 7 2024, 3:56 PM
KCVelaga_WMF moved this task from Current Quarter to Tracking on the Product-Analytics board.

@JEbe-WMF Thanks for working on the initial analysis. The following improvements would be helpful:

  • Currently, we are only capturing content translations, specifically page creations. In addition, we can also check for edit tag sectiontranslation.
  • While selecting we can have three classifications, cx or cx-2 tags with rev_parent_id = 0 (i.e. page creations), cx or cx-2 tags with rev_parent_id != 0 (i.e. overwriting) and section translation edits with rev_parent_id != 0 (i.e. section additions)
  • Once we get those results, we can list the distributions by each of those conditions.
  • You can ignore the conditions listed for function rr_dist_wiki (as they were originally written for extended confirmed), and use the above.
  • is_revert and is_anon can be skipped as cx and sx edits wouldn't be reverting anything, and also the tools are limited to registered users only.

Let me know if you have any questions.