Description
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Open | None | T227094 Update RC Filters for new ORES capacities (July, 2019) | |||
Resolved | SBisson | T225561 Update ORES thresholds for nlwiki | |||
Open | None | T223273 Update srwiki thresholds for goodfaith model | |||
Resolved | SBisson | T225562 Deploy ORES filters for zhwiki | |||
Open | None | T225563 Deploy ORES filters for jawiki | |||
Resolved | Halfak | T224484 ORES deployment: Early June | |||
Resolved | • Catrope | T197012 Enable srwiki edit quality filters in RecentChanges | |||
Resolved | Halfak | T199355 Investigate srwiki goodfaith model, why is it so bad? | |||
Resolved | None | T220556 New labeling campaign for srwiki |
Event Timeline
Support from me. One question, if we start training new model, will ORES still be in RC?
Sorry this one as left hanging for a while. We'll be giving this priority and doing some exploration ASAP.
First thing I want to try is to re-review the "bad-faith" labeled edit from the campaign to see if there is something weird going on. I'll ping soon with some information about that.
I've dumped all of the edits labeled "badfaith" into this etherpad: https://etherpad.wikimedia.org/p/srwiki_badfaith_edits
There are ~450 of them -- which is quite a lot. We don't need to re-label them all, but maybe we can check what is there. @Acamicamacaraca, I can do a bit of work using translation utilities, but I'd appreciate it if you could look at 25-50 of these and just write a short description of what you're seeing and if you agree with the label. I'd really be interested in any help from other srwiki-pedians too :)
I just labeled a few. I'm seeing some edits that look like they are goodfaith in this set. I wonder if I am missing something.
So, as it stands, more than half of the items labeled badfaith are actually goodfaith upon review. I'll look into these labels to see if I can see some sort of consistency with them.
It looks like a lot of the edits that were labeled "badfaith" but that we have no re-labeled "goodfaith" were saved by @Zoranzoki21. That might be simply because Zoranzoki21 did a lot of labeling work. Would you take a look at them to see if you agree with our re-assessment? Maybe there is some confusion as to the meaning of "goodfaith".
OK it's clear that we would benefit from re-labeling these 500 revisions using Wiki labels. I'm working to get a campaign loaded. I'd like to call it something like "Edit quality (500 edits re-review)" or something like that. Could someone help me get a Serbian translation of that?
In the meantime, I added the campaign here: https://labels.wmflabs.org/ui/srwiki/ Please pick up these edits and re-label them as we were doing in the etherpad. Once we're done with this, we can re-examine the data and update the training/testing set.
@Halfak Thanks! I working now on it. Translation on Serbian of this is: "Квалитет измена (поновни преглед 500 измена)"
I had some problems on end, but I talked with @Halfak at IRC and he resolved so I successfully completed all.
Just sat down with this again. Here's the old dataset:
edits | damaging | goodfaith |
10 | False | False |
119212 | False | True |
447 | True | False |
225 | True | True |
And the new re-labeled dataset:
edits | damaging | goodfaith |
0 | False | False |
119469 | False | True |
151 | True | False |
274 | True | True |
Just at a glance this looks way more reasonable. In the original edits, we had 10 edits labeled as not damaging, but still "badfaith". Now those have disappeared and we've gone from 447 badfaith edits to 151.
I'm re-training the models now. I'll report back tomorrow on the fitness we get.
Huge boost in model fitness! This is now one of the best "goodfaith" models that we have! I've submitted my work for review. See https://github.com/wikimedia/editquality/pull/195 Will update about deployments of the new model when that is ready.
What about damage model? I forgot to ask is it good. We implemented it on sr.wiki already.
We got a minor improvement for the "damaging" model too but the change is really too small to meaningful.
FYI, I'm still waiting on review for this change. My team is a bit understaffed at the moment, so I need to rely on external reviewers. Sorry for the delay!
It looks like we're going to get this deployed next week. I'm aiming for Monday, June 17th.
We've been blocked for a while on a few issues. E.g. an issue with our source code control/deployment system (T224996) and now we're blocked on deployment while the "Site Reliability Engineering" team has an offsite this week.
If you'd like to make a announcement, I think that is a great idea. Let me know how I can help.
I informed the community. I hope you can deploy this till next week since you have some issues. Best regards!
Actually, we just deployed from our side on Monday. We're now waiting on the Growth-Team to enable the filters in RecentChanges. But if you use Huggle or RTRC, you should be able to see ORES predictions right away.
Hey! User intent filters are not yet displayed in Recent Changes (screenshot). Since Halfak said we're waiting for the Growth team now, @Trizek-WMF do you know anything about this and when it should be deployed?