Page MenuHomePhabricator

Enable ORES filters for ukwiki (Ukrainian Wikipedia)
Closed, ResolvedPublic

Description

ORES now has a damaging and goodfaith model for ukwiki. Please add the contribution quality and user intention filters to RCFilters on that wiki.

Contact: @Ata and @Base

Event Timeline

Restricted Application added subscribers: Base, Aklapper. · View Herald Transcript
Halfak added a subscriber: Ata.

@Halfak Hi, is it possible to enable ORES filters (at least as a beta feature for ukwiki) now? What's left? Do you wait for https://phabricator.wikimedia.org/T251571 ?

I don't think anything is blocking the deployment of the filters. This should be on the Growth-Team backlog. @MMiller_WMF, it looks like the UK Wikipedians have been waiting a while for the filters to be turned on in recent changes.

@Halfak -- thanks for bringing this up. I talked about this with @Trizek-WMF, and we're planning to get it on our schedule next week. I had it cooking on our to-do list, but forgot to update you.

Note: the team discussed this today, and needs to return to it next week. We have a full plate this week.

This confused me for a while, but I think I found an OK configuration. The stats are a bit strange though - for other wikis that I have seen precision and recall are more or less an "X" shape for the "bad" outcomes, while here especially for goodfaith precision is not even remotely monotonic, and it's just not possible to reach better precision than 0.6. Is that legit?

damaging=falsedamaging = truegoodfaith = falsegoodfaith = true
ukwiki-damaging-false.png (480×720 px, 38 KB)
ukwiki-damaging-true.png (480×720 px, 50 KB)
ukwiki-goodfaith-false.png (480×720 px, 50 KB)
ukwiki-goodfaith-true.png (480×720 px, 36 KB)

Anyway the numbers I ended up with:

modelfilterminmaxconditionprecisionrecall
damaginglikelygood00.147maximum recall @ precision >= 0.9970.9970.899
damagingmaybebad0.1221maximum filter_rate @ recall >= 0.9 (default)0.1610.903
damaginglikelybad0.7451maximum recall @ precision >= 0.450.4510.258
damagingverylikelybad
goodfaithlikelygood0.9441maximum recall @ precision >= 0.9990.9990.88
goodfaithmaybebad00.777maximum recall @ precision >= 0.150.150.74
goodfaithlikelybad00.301maximum recall @ precision >= 0.450.4510.246
goodfaithverylikelybad

Damaging verylikelybad was dropped because it would need a precision of ~0.55 to get recall above 0.1, and the guide says we should aim for high precision.
Goodfaith verylikelybad was dropped because precision levels above 0.6 are completely impossible and recall >= 0.1 would take something like 0.48 precision.

Change 655301 had a related patch set uploaded (by Gergő Tisza; owner: Gergő Tisza):
[operations/mediawiki-config@master] Alphabetize ORES settings

https://gerrit.wikimedia.org/r/655301

Change 655302 had a related patch set uploaded (by Gergő Tisza; owner: Gergő Tisza):
[operations/mediawiki-config@master] Enable ORES filters on ukwiki

https://gerrit.wikimedia.org/r/655302

@Halfak -- would you be able to look at @Tgr's question above? I can also ask the people on the Scoring Platform team. Thank you!

This confused me for a while, but I think I found an OK configuration. The stats are a bit strange though - for other wikis that I have seen precision and recall are more or less an "X" shape for the "bad" outcomes, while here especially for goodfaith precision is not even remotely monotonic, and it's just not possible to reach better precision than 0.6. Is that legit?

I think you've interpreted these graphs correctly, and it means the goodfaith model for this wiki just isn't very good. Unfortunately this is common, especially in cases where bad faith edits are rare in the labeling data.

Your numbers look good to me. You're right that we shouldn't offer verylikelybad filters for either model, because the models don't perform well enough for that. The other filters are set well and behave as expected. The recall for the likelybad filters is low, but that's what happens with poor models like these.

Thanks for checking, @Catrope!
Scheduling deployment to the Tuesday SF morning time slot.

Change 655301 merged by jenkins-bot:
[operations/mediawiki-config@master] Alphabetize ORES settings

https://gerrit.wikimedia.org/r/655301

Change 655302 merged by jenkins-bot:
[operations/mediawiki-config@master] Enable ORES filters on ukwiki

https://gerrit.wikimedia.org/r/655302

Mentioned in SAL (#wikimedia-operations) [2021-01-12T19:46:48Z] <tgr@deploy1001> Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:655302|Enable ORES filters on ukwiki (T256887)]] (duration: 01m 05s)

Mentioned in SAL (#wikimedia-operations) [2021-01-12T19:48:22Z] <tgr_> synced Config: [[gerrit:655301|Alphabetize ORES settings (T256887)]]

@Ata, @Base, @Trizek-WMF This is now enabled on ukwiki. It might take a while to process all old revision, but for fresh changes you should see ORES filters working. Please report if they are behaving unreasonably.

Etonkovidova subscribed.

Checked and moved to PM column for @MMiller_WMF FYI.

  • The filters are present (three filters for each model)
  • the translation seems to be in place
  • the stats on Special:ORESModels looks reasonable

Thank you! I have a general question: if I understand correctly, I am not really aware of the terminology here, you mention above that the model for ukwiki is not as good as it is for other wikis. I assume this will have some consequences for the usability of the filters? Is it immutable or there are ways to improve the model? (Or perhaps it can self improve basing on what is being reverted or some other feed?)

Thank you! I have a general question: if I understand correctly, I am not really aware of the terminology here, you mention above that the model for ukwiki is not as good as it is for other wikis.

https://www.mediawiki.org/wiki/ORES/Thresholds is a good resource about the terminology.

I assume this will have some consequences for the usability of the filters? Is it immutable or there are ways to improve the model? (Or perhaps it can self improve basing on what is being reverted or some other feed?)

Good question, I'm not sure but I believe you may need to do another labeling campaign (https://www.mediawiki.org/wiki/ORES/Get_support#Advanced_edit_quality_support) to improve the model. Maybe @Halfak knows what the next step is from here.

Thanks for your question about how to improve the models, @Base. I'm tagging @calbon, who works on the Scoring Platform team and may be able to help.