Page MenuHomePhabricator

Revert prediction data request
Closed, ResolvedPublic

Assigned To
Authored By
Samwalton9-WMF
May 10 2023, 4:58 PM
Referenced Files
F36991485: image.png
May 10 2023, 6:29 PM
F36991483: image.png
May 10 2023, 6:29 PM
F36991439: image.png
May 10 2023, 6:01 PM
F36991442: image.png
May 10 2023, 6:01 PM

Description

To understand how impactful this model would be if implemented in a revert bot I'd really appreciate some data on the following:

How many edits would be reverted if it was configured for 95% and 99% accuracy on: en.wiki, es.wiki, and uk.wiki?

Event Timeline

@diego per our chat about this model this would be a really helpful metric for me to understand. Have I phrased the question in a way that's helpful? And are 95% and 99% accuracy realistic figures to be testing for?

@Pablo do you have these numbers on the Risk Observatory? (If not, I would need until Monday to compute this)

Yes, with a test dataset of all revisions from January to October 2022 I evaluated the model for 100 thresholds (0.00, 0.01, 0.02, ..., 1.00) on several wikis (unfortunately not ukwiki, I would need to compute that one).

Results

image.png (290×989 px, 32 KB)
image.png (290×989 px, 31 KB)

https://gitlab.wikimedia.org/paragon/knowledge-integrity-risk-index/-/blob/main/data/reverts/enwiki.csv
https://gitlab.wikimedia.org/paragon/knowledge-integrity-risk-index/-/blob/main/data/reverts/eswiki.csv

Results distinguishing IP edits and user edits

image.png (790×589 px, 57 KB)
image.png (790×589 px, 56 KB)

https://gitlab.wikimedia.org/paragon/knowledge-integrity-risk-index/-/blob/main/data/reverts_anonymous/enwiki.csv
https://gitlab.wikimedia.org/paragon/knowledge-integrity-risk-index/-/blob/main/data/reverts_anonymous/eswiki.csv

Risk Observatory

The new version of the Risk Observatory contains metrics based on the notion of high-risk revision: a revision with a revert risk score greater than or equal to the threshold that maximises accuracy for that wiki in the test dataset (filtering out IP and bot edits).

@Pablo , please can you provide the absolute numbers for the 3 wikis that @Samwalton9 is asking for? let's consider January'22

Sure, these are the numbers for enwiki and eswiki in January 22 with different thresholds:

risk score ≥ 0.0

wiki_dbrevertedcountratio
enwikifalse21485990.882
enwikitrue2887680.118
eswikifalse2994850.818
eswikitrue667690.182

risk score ≥ 0.9

wiki_dbrevertedcountratio
enwikifalse1287230.55
enwikitrue1054940.45
eswikifalse202900.398
eswikitrue307050.602

risk score ≥ 0.91

wiki_dbrevertedcountratio
enwikifalse1091620.533
enwikitrue957590.467
eswikifalse172970.38
eswikitrue282090.62

risk score ≥ 0.92

wiki_dbrevertedcountratio
enwikifalse897240.513
enwikitrue851740.487
eswikifalse142610.361
eswikitrue252890.639

risk score ≥ 0.93

wiki_dbrevertedcountratio
enwikifalse711600.49
enwikitrue739950.51
eswikifalse112710.336
eswikitrue222710.664

risk score ≥ 0.94

wiki_dbrevertedcountratio
enwikifalse541800.464
enwikitrue624750.536
eswikifalse84590.308
eswikitrue189820.692

risk score ≥ 0.95

wiki_dbrevertedcountratio
enwikifalse390620.435
enwikitrue507590.565
eswikifalse59800.277
eswikitrue156330.723

risk score ≥ 0.96

wiki_dbrevertedcountratio
enwikifalse257450.397
enwikitrue391310.603
eswikifalse37410.237
eswikitrue120550.763

risk score ≥ 0.97

wiki_dbrevertedcountratio
enwikifalse137530.34
enwikitrue266880.66
eswikifalse18750.187
eswikitrue81730.813

risk score ≥ 0.98

wiki_dbrevertedcountratio
enwikifalse44760.249
enwikitrue134910.751
eswikifalse5710.119
eswikitrue42440.881

risk score ≥ 0.99

wiki_dbrevertedcountratio
enwikifalse4100.112
enwikitrue32450.888
eswikifalse630.049
eswikitrue12350.951

Note: Results were obtained with the language-agnostic model, not the multilingual one. Revisions, that include IP edits and bot edits, were retrieved with this script https://gitlab.wikimedia.org/repos/research/knowledge_integrity/-/blob/mnz/examples/examples/notebooks/revertrisk_example.ipynb.

Much appreciated! Summarising the enwiki data from above in a more readable format so I can remember how this should be interpreted:

  • In Jan 2022 on English Wikipedia, 288,768 edits were reverted (12% of all edits).
  • With a threshold of 0.98, the model would revert 13,491 of those (4.7% of reverted edits; 450 per day) and with a threshold of 0.99 would revert 3,245 (1.1% of reverted edits; 108 per day).
  • At 0.98 it would have a false positive rate of 25%, reverting 4,476 edits which were not reverted by the community, and at 0.99 a false positive rate of 11%, reverting 410 edits which were not reverted by the community.

Worth noting, as I understand it, that this definition of reverted includes any edit which takes a page back to an earlier state. That means we're not comparing only to anti-vandalism edits.

I'm curious what kinds of edits the model is giving false positives for at 0.98/0.99, i.e. is there a pattern? Could we easily drop that number by, for example, excluding certain namespaces or user groups?

We know that all these models (Revert Risk but also ORES) are biased against IP edits. Anyhow, our model is know mitigating that bias, and we are doing better on that front that former models like ORES.

The model that Pablo is reporting here is the language agnostic one, we run that one because is faster. However, we have another model based on mBert, a Large Language Model, that is more accurate, and less biased against IP edits, but slower. So, one option could be to apply the Language Agnostic (faster) on edits done by registered users, and use the slower one for the IP edits.

We know that all these models (Revert Risk but also ORES) are biased against IP edits. Anyhow, our model is know mitigating that bias, and we are doing better on that front that former models like ORES.

The model that Pablo is reporting here is the language agnostic one, we run that one because is faster. However, we have another model based on mBert, a Large Language Model, that is more accurate, and less biased against IP edits, but slower. So, one option could be to apply the Language Agnostic (faster) on edits done by registered users, and use the slower one for the IP edits.

Just to clarify, do you mean that the false positives are probably primarily good IP edits?

Can you remind me how many languages the mBert version supports?