|Resolved||None||T130213 [Epic] Edit quality models (damaging/goodfaith)|
|Duplicate||None||T130294 Deploy edit quality models for ukwiki|
|Resolved||Tgr||T256887 Enable ORES filters for ukwiki (Ukrainian Wikipedia)|
This confused me for a while, but I think I found an OK configuration. The stats are a bit strange though - for other wikis that I have seen precision and recall are more or less an "X" shape for the "bad" outcomes, while here especially for goodfaith precision is not even remotely monotonic, and it's just not possible to reach better precision than 0.6. Is that legit?
|damaging=false||damaging = true||goodfaith = false||goodfaith = true|
Anyway the numbers I ended up with:
|damaging||likelygood||0||0.147||maximum recall @ precision >= 0.997||0.997||0.899|
|damaging||maybebad||0.122||1||maximum filter_rate @ recall >= 0.9 (default)||0.161||0.903|
|damaging||likelybad||0.745||1||maximum recall @ precision >= 0.45||0.451||0.258|
|goodfaith||likelygood||0.944||1||maximum recall @ precision >= 0.999||0.999||0.88|
|goodfaith||maybebad||0||0.777||maximum recall @ precision >= 0.15||0.15||0.74|
|goodfaith||likelybad||0||0.301||maximum recall @ precision >= 0.45||0.451||0.246|
Damaging verylikelybad was dropped because it would need a precision of ~0.55 to get recall above 0.1, and the guide says we should aim for high precision.
Goodfaith verylikelybad was dropped because precision levels above 0.6 are completely impossible and recall >= 0.1 would take something like 0.48 precision.
I think you've interpreted these graphs correctly, and it means the goodfaith model for this wiki just isn't very good. Unfortunately this is common, especially in cases where bad faith edits are rare in the labeling data.
Your numbers look good to me. You're right that we shouldn't offer verylikelybad filters for either model, because the models don't perform well enough for that. The other filters are set well and behave as expected. The recall for the likelybad filters is low, but that's what happens with poor models like these.
Thank you! I have a general question: if I understand correctly, I am not really aware of the terminology here, you mention above that the model for ukwiki is not as good as it is for other wikis. I assume this will have some consequences for the usability of the filters? Is it immutable or there are ways to improve the model? (Or perhaps it can self improve basing on what is being reverted or some other feed?)
https://www.mediawiki.org/wiki/ORES/Thresholds is a good resource about the terminology.
I assume this will have some consequences for the usability of the filters? Is it immutable or there are ways to improve the model? (Or perhaps it can self improve basing on what is being reverted or some other feed?)
Good question, I'm not sure but I believe you may need to do another labeling campaign (https://www.mediawiki.org/wiki/ORES/Get_support#Advanced_edit_quality_support) to improve the model. Maybe @Halfak knows what the next step is from here.