Page MenuHomePhabricator

Update ORES filter thresholds for huwiki
Closed, ResolvedPublic

Description

We have deployed an improved version of the models. The thresholds might need a minor update.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
kostajh added subscribers: SBisson, kostajh.

@SBisson would you have time to take on this one as well?

@SBisson would you have time to take on this one as well?

I could but @Tgr has expressed a special interest in it so I'm happy to step back and support.

Change 536732 had a related patch set uploaded (by Gergő Tisza; owner: Gergő Tisza):
[operations/mediawiki-config@master] Update ORES filter threshold configuration for new huwiki model

https://gerrit.wikimedia.org/r/536732

One thing I noted while playing around with the data is that the frequency of edits matching damaging/likelygood is very low for anons (in the single digits monthly, while total anon edits tend to be between 4K-10K). Does that mean the filter threshold is poorly chosen (although it's high for editors, in the 80-90% range), the model is still biased against anons, or does this simply reflect the fact that anons are harder to trust? (It probably doesn't reflect edit quality - manual checks usually find that between a quarter and a third of anon edits are problematic.)

Also, goodfaith/likelybad and goodfaith/verylikelybad are barely different for anons (see graph here showing the fraction of edits these match monthly). They are fairly different for non-anonymous users but then there are (as one would expect) about 100x more matching anon edits. Could this be a threshhold problem, or a bias problem, or is it completely normal?

Change 536732 merged by jenkins-bot:
[operations/mediawiki-config@master] Update ORES filter threshold configuration for new huwiki model

https://gerrit.wikimedia.org/r/536732

Mentioned in SAL (#wikimedia-operations) [2019-09-17T11:22:36Z] <awight@deploy1001> Synchronized wmf-config/VariantSettings.php: SWAT: [[gerrit:536732|Update ORES filter threshold configuration for new huwiki model (T230031)]] (duration: 00m 55s)

@Tgr -- is there something in particular you think I should review here? Do you think we should also have @Etonkovidova review? (her review usually comes before mine in the process)

@MMiller_WMF not anything in particular, I just used it as the "final" column of the table (should have said so explicitly, in hindsight). The questions in T230031#5492951 and T230031#5494291 are more for @Halfak and more out of curiosity than concern.

The result of the patch is that the system will judge slightly differently which edits to add warning colors to, that change is not really detectable without statistical analysis (or lots and lots of patrolling), so I assumed it cannot really be QA-ed. Even if I make a mistake in the patch and the recentchanges interface starts labelling patches really strangely, I'm not sure how apparent that would be during QA. @Etonkovidova what do you think?

@MMiller_WMF not anything in particular, I just used it as the "final" column of the table (should have said so explicitly, in hindsight). The questions in T230031#5492951 and T230031#5494291 are more for @Halfak and more out of curiosity than concern.

The result of the patch is that the system will judge slightly differently which edits to add warning colors to, that change is not really detectable without statistical analysis (or lots and lots of patrolling), so I assumed it cannot really be QA-ed. Even if I make a mistake in the patch and the recentchanges interface starts labelling patches really strangely, I'm not sure how apparent that would be during QA. @Etonkovidova what do you think?

Often it makes sense just to take a general look to check if everything is still in place and functioning. But, of course, it depends of the specifics of cases.