Page MenuHomePhabricator

Enable ORES filters on RC for Italian Wikipedia
Closed, ResolvedPublic

Description

See https://ores.wikimedia.org/v3/scores/itwiki

ORES models are ready. We should deploy the filters.

See https://labels.wmflabs.org/stats/itwiki/18 for those who labeled the most edits to train ORES. These contributors will be most likely to be interested in helping us announce the new filters.

We've worked with @Rotpunkt in the past to improve ORES for itwiki so he might be interested in helping out.

Event Timeline

Halfak created this task.Dec 3 2018, 5:14 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptDec 3 2018, 5:14 PM
Restricted Application added a project: artificial-intelligence. · View Herald TranscriptFeb 5 2019, 9:56 PM

Change 489934 had a related patch set uploaded (by Catrope; owner: Catrope):
[operations/mediawiki-config@master] Enable ORES (damaging-only) on itwiki

https://gerrit.wikimedia.org/r/489934

I've scheduled the above patch for deployment on Thursday February 14 at 00:00-01:00 UTC (Wednesday 16:00-17:00).

I'm only enabling the damaging model, because there seems to be something wrong with the goodfaith model. It almost looks like true and false got switched somewhere. Here are the precision/recall graphs for goodfaith=false and goodfaith=true (generated with Adam's new tool):

Since the vast majority of edits are good faith, you'd expect the precision line (light blue) to start high and creep up in the true graph (bottom), and for it to climb up from near zero in the false graph (top), but what actually happens is the reverse. The graphs for other wikis' goodfaith models do look as expected (including dewiki, which is also new).

I did some analysis to figure out that it is configuration issue. Still we'll need to retrain the model.

Hmm. In the model stats for goodfaith, I see:

"counts": {
  "labels": {
    "false": 355,
    "true": 18190
  },

So that suggests that the labels aren't too crazy. I agree that the graph looks strange.

When I try an optimization for "maximum precision @ recall >= 0.9", I get

  "false": [
    {
      "!f1": 0.244,
      "!precision": 0.143,
      "!recall": 0.837,
      "accuracy": 0.9,
      "f1": 0.947,
      "filter_rate": 0.113,
      "fpr": 0.163,
      "match_rate": 0.887,
      "precision": 0.996,
      "recall": 0.901,
      "threshold": 0.058
    }
  ]
}

This shows a low threshold, but great fitness statistics (precision = 0.996 @ recall of 0.901). But how could this be true with a match rate of 0.877?

Aha! I got it! It looks like the population rates were not specified correctly!

"rates": {
  "population": {
    "false": 0.981,
    "true": 0.019
   },

Obviously this is backwards and it is messing up the calculations. We'll need to retain this model. I'll have a PR shortly.

@Catrope, we can probably get this out in the next deployment -- which could be tomorrow or Thursday. I hope that won't be too disruptive. Thanks for your report and your patience!

Catrope added a comment.EditedFeb 13 2019, 7:51 AM

@Catrope, we can probably get this out in the next deployment -- which could be tomorrow or Thursday. I hope that won't be too disruptive. Thanks for your report and your patience!

Thanks! As long as this only rebuilds itwiki goodfaith, and doesn't touch any of the other models (including itwiki damaging), I can just follow up with a second patch enabling the goodfaith model for RCFilters on itwiki after your deployment. If you do change itwiki's damaging model (or any other wikis' damaging/goodfaith models), then please tell me, because in that case I'd like to look at the updated models before I deploy anything.

Thanks to you all, this is great! I'm fully available in case you need any testing/help/whatever on itwiki, just let me know (unfortunately Rotpunkt has retired). Should I write down an announcement for the community, or have you already prepared it?

Maybe @Trizek-WMF could point us to texts he used for past announcements. If not, I can draft something.

Daimona added a comment.EditedFeb 13 2019, 5:12 PM

@Halfak thanks, I'm going to translate and publish it.
EDIT: done.

Titore added a subscriber: Titore.Feb 14 2019, 12:21 AM

Change 489934 merged by jenkins-bot:
[operations/mediawiki-config@master] Enable ORES (damaging-only) on itwiki

https://gerrit.wikimedia.org/r/489934

Mentioned in SAL (#wikimedia-operations) [2019-02-14T00:32:57Z] <catrope@deploy1001> Synchronized wmf-config/InitialiseSettings.php: Enable ORES (damaging only) on itwiki (T211032) (duration: 00m 53s)

Catrope closed this task as Resolved.Feb 14 2019, 12:33 AM
Catrope claimed this task.
Catrope reopened this task as Open.Feb 14 2019, 12:37 AM

Whoops, I guess I shouldn't close this yet, because goodfaith isn't done yet. In any case, damaging is now live.

We're delayed on getting this deployment out the door because of one model we forgot to rebuild. We'll likely have the Italian goodfaith ready early next week.

@Catrope Hey, the model is live. Please check if it works fine :)

Catrope removed Catrope as the assignee of this task.

Based on @Catrope's documentation, this is how I propose to configure the goodfaith model's levels for itwiki

leveldefault configprecisionrecallrangeproposed configprecisionrecallrange
likelygoodmaximum recall @ precision >= 0.9950.9950.9130.87 - 1maximum recall @ precision >= 0.990.990.9780.665 - 1
maybebadmaximum filter_rate @ recall >= 0.9???maximum recall @ precision >= 0.150.1520.7630 - 0.865
likelybadmaximum recall @ precision >= 0.60.6020.2280 - 0.343use default---
verylikelybadfalse (disabled)---maximum recall @ precision >= 0.90.9190.1270 - 0.1517
Halfak added a comment.Mar 1 2019, 3:59 PM

Looks solid to me.

This looks fine. I understand every choice you made within 3 seconds of looking at it, except for the likelygood level: why did you move it down from 0.995 to 0.99 when the recall figures aren't much different, and this creates overlap with maybebad that wouldn't otherwise exist?

This looks fine. I understand every choice you made within 3 seconds of looking at it, except for the likelygood level: why did you move it down from 0.995 to 0.99 when the recall figures aren't much different, and this creates overlap with maybebad that wouldn't otherwise exist?

I was under the impression an overlap between likelygood and maybebad was desirable. Otherwise there's a small gap: an edit with a score of 0.866 doesn't match any level.

We can also just keep the default, which appears to be performing quite well.

Overlap between those two is tolerated, but not necessarily desired. We certainly shouldn't go out of our way to engineer one. If this is unclear in my documentation, feel free to suggest edits (or preferably make edits yourself, it's a wiki after all).

I'd be in favor of keeping the default for likelygood.

Change 493749 had a related patch set uploaded (by Sbisson; owner: Sbisson):
[operations/mediawiki-config@master] Enable and configure the ORES goodfaith model on itwiki

https://gerrit.wikimedia.org/r/493749

Change 493749 merged by jenkins-bot:
[operations/mediawiki-config@master] Enable and configure the ORES goodfaith model on itwiki

https://gerrit.wikimedia.org/r/493749

Change 494306 had a related patch set uploaded (by Catrope; owner: Catrope):
[operations/mediawiki-config@master] Reapply "Enable and configure the ORES goodfaith model on itwiki""

https://gerrit.wikimedia.org/r/494306

Change 494306 merged by jenkins-bot:
[operations/mediawiki-config@master] Reapply "Enable and configure the ORES goodfaith model on itwiki""

https://gerrit.wikimedia.org/r/494306

Mentioned in SAL (#wikimedia-operations) [2019-03-05T00:40:58Z] <catrope@deploy1001> Synchronized wmf-config/InitialiseSettings.php: Enable ORES goodfaith on itwiki (T211032) (duration: 00m 47s)

Catrope closed this task as Resolved.Mar 5 2019, 12:48 AM
Catrope reopened this task as Open.
Catrope moved this task from Code Review to QA on the Growth-Team (Current Sprint) board.
Etonkovidova closed this task as Resolved.Mar 5 2019, 9:24 PM