We have deployed an improved version of the models. The thresholds might need a minor update.
|operations/mediawiki-config||master||+5 -7||Update ORES filter threshold configuration for new huwiki model|
|Resolved||Tgr||T230031 Update ORES filter thresholds for huwiki|
|Resolved||ACraze||T228078 Retrain damaging/goodfaith models for huwiki|
|Resolved||Halfak||T223882 Re-label huwiki damaging and badfaith edits|
|Open||None||T223899 Information about finished campaigns should be accessible in Wikilabels|
One thing I noted while playing around with the data is that the frequency of edits matching damaging/likelygood is very low for anons (in the single digits monthly, while total anon edits tend to be between 4K-10K). Does that mean the filter threshold is poorly chosen (although it's high for editors, in the 80-90% range), the model is still biased against anons, or does this simply reflect the fact that anons are harder to trust? (It probably doesn't reflect edit quality - manual checks usually find that between a quarter and a third of anon edits are problematic.)
Also, goodfaith/likelybad and goodfaith/verylikelybad are barely different for anons (see graph here showing the fraction of edits these match monthly). They are fairly different for non-anonymous users but then there are (as one would expect) about 100x more matching anon edits. Could this be a threshhold problem, or a bias problem, or is it completely normal?
@MMiller_WMF not anything in particular, I just used it as the "final" column of the table (should have said so explicitly, in hindsight). The questions in T230031#5492951 and T230031#5494291 are more for @Halfak and more out of curiosity than concern.
The result of the patch is that the system will judge slightly differently which edits to add warning colors to, that change is not really detectable without statistical analysis (or lots and lots of patrolling), so I assumed it cannot really be QA-ed. Even if I make a mistake in the patch and the recentchanges interface starts labelling patches really strangely, I'm not sure how apparent that would be during QA. @Etonkovidova what do you think?