Page MenuHomePhabricator

Investigate srwiki goodfaith model, why is it so bad?
Open, HighPublic

Description

In T197012, @Catrope discovered that our goodfaith model is unusable. Let's look at anomalies in the training data and try to solve the underlying issue, then train a new model.

Event Timeline

awight created this task.Jul 11 2018, 6:18 PM
Restricted Application removed a project: Patch-For-Review. · View Herald TranscriptJul 11 2018, 6:18 PM

Support from me. One question, if we start training new model, will ORES still be in RC?

One question, if we start training new model, will ORES still be in RC?

Yes, a new labeling campaign won't affect anything that's already deployed.

SBisson removed Catrope as the assignee of this task.Jul 12 2018, 1:30 PM
Harej moved this task from Untriaged to Ideas on the Scoring-platform-team board.Jul 23 2018, 5:07 PM
Harej moved this task from Ideas to Research & analysis on the Scoring-platform-team board.
awight removed a subscriber: awight.Mar 21 2019, 4:03 PM
Halfak added a comment.Apr 9 2019, 9:28 PM

Sorry this one as left hanging for a while. We'll be giving this priority and doing some exploration ASAP.

First thing I want to try is to re-review the "bad-faith" labeled edit from the campaign to see if there is something weird going on. I'll ping soon with some information about that.

Halfak triaged this task as High priority.Apr 9 2019, 9:28 PM

I've dumped all of the edits labeled "badfaith" into this etherpad: https://etherpad.wikimedia.org/p/srwiki_badfaith_edits

There are ~450 of them -- which is quite a lot. We don't need to re-label them all, but maybe we can check what is there. @Acamicamacaraca, I can do a bit of work using translation utilities, but I'd appreciate it if you could look at 25-50 of these and just write a short description of what you're seeing and if you agree with the label. I'd really be interested in any help from other srwiki-pedians too :)

I just labeled a few. I'm seeing some edits that look like they are goodfaith in this set. I wonder if I am missing something.

I labeled some 30+ diffs

I want to help. How I can save edits in list at etherpad?

An etherpad is directly editable. You should be able to just type into it.

So, as it stands, more than half of the items labeled badfaith are actually goodfaith upon review. I'll look into these labels to see if I can see some sort of consistency with them.

It looks like a lot of the edits that were labeled "badfaith" but that we have no re-labeled "goodfaith" were saved by @Zoranzoki21. That might be simply because Zoranzoki21 did a lot of labeling work. Would you take a look at them to see if you agree with our re-assessment? Maybe there is some confusion as to the meaning of "goodfaith".

An etherpad is directly editable. You should be able to just type into it.

Yes, thanks!

It looks like a lot of the edits that were labeled "badfaith" but that we have no re-labeled "goodfaith" were saved by @Zoranzoki21. That might be simply because Zoranzoki21 did a lot of labeling work. Would you take a look at them to see if you agree with our re-assessment? Maybe there is some confusion as to the meaning of "goodfaith".

I checked diffs from lines 59 to 70. Will check others too.

OK it's clear that we would benefit from re-labeling these 500 revisions using Wiki labels. I'm working to get a campaign loaded. I'd like to call it something like "Edit quality (500 edits re-review)" or something like that. Could someone help me get a Serbian translation of that?

In the meantime, I added the campaign here: https://labels.wmflabs.org/ui/srwiki/ Please pick up these edits and re-label them as we were doing in the etherpad. Once we're done with this, we can re-examine the data and update the training/testing set.

Restricted Application added a project: artificial-intelligence. · View Herald TranscriptTue, Apr 30, 8:57 PM

OK it's clear that we would benefit from re-labeling these 500 revisions using Wiki labels. I'm working to get a campaign loaded. I'd like to call it something like "Edit quality (500 edits re-review)" or something like that. Could someone help me get a Serbian translation of that?

@Halfak Thanks! I working now on it. Translation on Serbian of this is: "Квалитет измена (поновни преглед 500 измена)"

257 labels left. I will end with this until the end of day.

I had some problems on end, but I talked with @Halfak at IRC and he resolved so I successfully completed all.

Just sat down with this again. Here's the old dataset:

editsdamaginggoodfaith
10FalseFalse
119212FalseTrue
447TrueFalse
225TrueTrue

And the new re-labeled dataset:

editsdamaginggoodfaith
0FalseFalse
119469FalseTrue
151TrueFalse
274TrueTrue

Just at a glance this looks way more reasonable. In the original edits, we had 10 edits labeled as not damaging, but still "badfaith". Now those have disappeared and we've gone from 447 badfaith edits to 151.

I'm re-training the models now. I'll report back tomorrow on the fitness we get.

Huge boost in model fitness! This is now one of the best "goodfaith" models that we have! I've submitted my work for review. See https://github.com/wikimedia/editquality/pull/195 Will update about deployments of the new model when that is ready.

Wow. From mud to gold :)

Wow. From mud to gold :)

This rhymes in Serbian :D