Page MenuHomePhabricator

Investigate srwiki goodfaith model, why is it so bad?
Closed, ResolvedPublic

Description

In T197012, @Catrope discovered that our goodfaith model is unusable. Let's look at anomalies in the training data and try to solve the underlying issue, then train a new model.

Event Timeline

Support from me. One question, if we start training new model, will ORES still be in RC?

In T199355#4417907, @Acamicamacaraca wrote:

One question, if we start training new model, will ORES still be in RC?

Yes, a new labeling campaign won't affect anything that's already deployed.

Sorry this one as left hanging for a while. We'll be giving this priority and doing some exploration ASAP.

First thing I want to try is to re-review the "bad-faith" labeled edit from the campaign to see if there is something weird going on. I'll ping soon with some information about that.

Halfak triaged this task as High priority.Apr 9 2019, 9:28 PM

I've dumped all of the edits labeled "badfaith" into this etherpad: https://etherpad.wikimedia.org/p/srwiki_badfaith_edits

There are ~450 of them -- which is quite a lot. We don't need to re-label them all, but maybe we can check what is there. @Acamicamacaraca, I can do a bit of work using translation utilities, but I'd appreciate it if you could look at 25-50 of these and just write a short description of what you're seeing and if you agree with the label. I'd really be interested in any help from other srwiki-pedians too :)

I just labeled a few. I'm seeing some edits that look like they are goodfaith in this set. I wonder if I am missing something.

I want to help. How I can save edits in list at etherpad?

An etherpad is directly editable. You should be able to just type into it.

So, as it stands, more than half of the items labeled badfaith are actually goodfaith upon review. I'll look into these labels to see if I can see some sort of consistency with them.

It looks like a lot of the edits that were labeled "badfaith" but that we have no re-labeled "goodfaith" were saved by @Zoranzoki21. That might be simply because Zoranzoki21 did a lot of labeling work. Would you take a look at them to see if you agree with our re-assessment? Maybe there is some confusion as to the meaning of "goodfaith".

An etherpad is directly editable. You should be able to just type into it.

Yes, thanks!

It looks like a lot of the edits that were labeled "badfaith" but that we have no re-labeled "goodfaith" were saved by @Zoranzoki21. That might be simply because Zoranzoki21 did a lot of labeling work. Would you take a look at them to see if you agree with our re-assessment? Maybe there is some confusion as to the meaning of "goodfaith".

I checked diffs from lines 59 to 70. Will check others too.

OK it's clear that we would benefit from re-labeling these 500 revisions using Wiki labels. I'm working to get a campaign loaded. I'd like to call it something like "Edit quality (500 edits re-review)" or something like that. Could someone help me get a Serbian translation of that?

In the meantime, I added the campaign here: https://labels.wmflabs.org/ui/srwiki/ Please pick up these edits and re-label them as we were doing in the etherpad. Once we're done with this, we can re-examine the data and update the training/testing set.

OK it's clear that we would benefit from re-labeling these 500 revisions using Wiki labels. I'm working to get a campaign loaded. I'd like to call it something like "Edit quality (500 edits re-review)" or something like that. Could someone help me get a Serbian translation of that?

@Halfak Thanks! I working now on it. Translation on Serbian of this is: "Квалитет измена (поновни преглед 500 измена)"

257 labels left. I will end with this until the end of day.

I had some problems on end, but I talked with @Halfak at IRC and he resolved so I successfully completed all.

Just sat down with this again. Here's the old dataset:

editsdamaginggoodfaith
10FalseFalse
119212FalseTrue
447TrueFalse
225TrueTrue

And the new re-labeled dataset:

editsdamaginggoodfaith
0FalseFalse
119469FalseTrue
151TrueFalse
274TrueTrue

Just at a glance this looks way more reasonable. In the original edits, we had 10 edits labeled as not damaging, but still "badfaith". Now those have disappeared and we've gone from 447 badfaith edits to 151.

I'm re-training the models now. I'll report back tomorrow on the fitness we get.

Huge boost in model fitness! This is now one of the best "goodfaith" models that we have! I've submitted my work for review. See https://github.com/wikimedia/editquality/pull/195 Will update about deployments of the new model when that is ready.

In T199355#5193386, @Acamicamacaraca wrote:

Wow. From mud to gold :)

This rhymes in Serbian :D

What about damage model? I forgot to ask is it good. We implemented it on sr.wiki already.

We got a minor improvement for the "damaging" model too but the change is really too small to meaningful.

FYI, I'm still waiting on review for this change. My team is a bit understaffed at the moment, so I need to rely on external reviewers. Sorry for the delay!

Can I notify community about this? I saw you have created patch on Gerrit.

It looks like we're going to get this deployed next week. I'm aiming for Monday, June 17th.

We've been blocked for a while on a few issues. E.g. an issue with our source code control/deployment system (T224996) and now we're blocked on deployment while the "Site Reliability Engineering" team has an offsite this week.

If you'd like to make a announcement, I think that is a great idea. Let me know how I can help.

I informed the community. I hope you can deploy this till next week since you have some issues. Best regards!

Actually, we just deployed from our side on Monday. We're now waiting on the Growth-Team to enable the filters in RecentChanges. But if you use Huggle or RTRC, you should be able to see ORES predictions right away.

Hey! User intent filters are not yet displayed in Recent Changes (screenshot). Since Halfak said we're waiting for the Growth team now, @Trizek-WMF do you know anything about this and when it should be deployed?

In T199355#6213087, @Acamicamacaraca wrote:

Hey! User intent filters are not yet displayed in Recent Changes (screenshot). Since Halfak said we're waiting for the Growth team now, @Trizek-WMF do you know anything about this and when it should be deployed?

Is there a dedicated task about implementing them?

In T199355#6213087, @Acamicamacaraca wrote:

Hey! User intent filters are not yet displayed in Recent Changes (screenshot). Since Halfak said we're waiting for the Growth team now, @Trizek-WMF do you know anything about this and when it should be deployed?

Is there a dedicated task about implementing them?

@Trizek-WMF What about T223273?