Page MenuHomePhabricator

Experiment with using English Wikipedia models on Simple English
Closed, ResolvedPublic

Description

Once the Simple English ORES models are enabled on the beta cluster, please copy a few samples edits over from simplewiki, both vandalism and good edits. Smoke-test the scores to see if we need to adjust thresholds, and check whether our features are appropriate.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Halfak renamed this task from Add language support for Simple English to Experiment with using English Wikipedia models on Simple English.Dec 2 2017, 4:46 PM
Halfak updated the task description. (Show Details)

I'm repurposing this task to set up the English Wikipedia models on Simplewiki because I think it is worth a try.

WMFLabs: https://github.com/wiki-ai/ores-wmflabs-deploy/pull/93 (merged)
Prod: https://gerrit.wikimedia.org/r/394759

Change 394759 had a related patch set uploaded (by Halfak; owner: halfak):
[mediawiki/services/ores/deploy@master] Use enwiki models on simplewiki.

https://gerrit.wikimedia.org/r/394759

OK so here's what I suggest you do.

  1. Disable the new recent changes filters in your preferences.
  2. Edit your "/common.js" to look like mine: https://simple.wikipedia.org/wiki/User:EpochFail/common.js
  3. Go back to Special:RecentChanges and wait a little bit. ORES should highlight changes that are likely to be damaging.
  4. Tell us how it goes!

I did a little bit of testing and I was able to catch some damaging edits :)

BTW, if this works out OK, we'll get ORES enabled for the fancy new recent changes filters too.

OK so here's what I suggest you do.

  1. Disable the new recent changes filters in your preferences.
  2. Edit your "/common.js" to look like mine: https://simple.wikipedia.org/wiki/User:EpochFail/common.js
  3. Go back to Special:RecentChanges and wait a little bit. ORES should highlight changes that are likely to be damaging.
  4. Tell us how it goes!

I did a little bit of testing and I was able to catch some damaging edits :)

I tested it out a bit. Works fine, most edits it marked were vandalism, except a lot were not. For example, it marked https://simple.wikipedia.org/w/index.php?title=JAY-Z&curid=75975&diff=5906288&oldid=5906287 as vandalism, when it is just the changing of an infobox type, and marked this edit in red: https://simple.wikipedia.org/w/index.php?title=Drake_(entertainer)&curid=210822&diff=5906280&oldid=5905690. So, some work could be done but it did not fail to mark any edits that were vandalism.

An important distinction. ORES does not "mark something as Vandalism". Instead, it marks something as "needing review". It's still good to note when it turns out that the review was that the edit was fine. But it's important that you consider the coloring as "there's something that looks funny about this" rather than "there's something wrong with this".

I'm glad to read that it is useful. We'll start moving forward with a deployment.

@awight, could you look at https://gerrit.wikimedia.org/r/394759 ?

It would be cool if this could go out in a deployment soon.

Change 394759 merged by Awight:
[mediawiki/services/ores/deploy@master] Use enwiki models on simplewiki.

https://gerrit.wikimedia.org/r/394759

Mentioned in SAL (#wikimedia-cloud) [2017-12-04T17:44:32Z] <awight> ORES: Try enwiki models on simplewiki, T181848 (6baed71)

Change 395052 had a related patch set uploaded (by Awight; owner: Awight):
[operations/mediawiki-config@master] Try simplewiki ORES on beta.

https://gerrit.wikimedia.org/r/395052

This change is deployed to the beta service, e.g. https://ores-beta.wmflabs.org/v3/scores/simplewiki/12345

The next steps are to enable the ORES UI and precaching on the beta cluster simplewiki, then if that looks good continue with the production service and config.

Change 395059 had a related patch set uploaded (by Awight; owner: Awight):
[operations/mediawiki-config@master] Enable ORES on simplewiki

https://gerrit.wikimedia.org/r/395059

Change 395052 merged by jenkins-bot:
[operations/mediawiki-config@master] Try simplewiki ORES on beta.

https://gerrit.wikimedia.org/r/395052

Change 395066 had a related patch set uploaded (by Awight; owner: Awight):
[operations/mediawiki-config@master] Add ORES filter thresholds for simplewiki

https://gerrit.wikimedia.org/r/395066

Change 395066 merged by jenkins-bot:
[operations/mediawiki-config@master] Add ORES filter thresholds for simplewiki

https://gerrit.wikimedia.org/r/395066

This is on the beta wiki, but I'm not going to proceed further today because something's missing:
https://simple.wikipedia.beta.wmflabs.org/wiki/Special:RecentChanges

I think that scores aren't being cached into the MediaWiki database yet? OH, we probably have to run a database migration?

Ran into an issue:

This test change,
https://simple.wikipedia.beta.wmflabs.org/w/index.php?diff=3266888

Cannot be found by the extractor,
http://ores-beta.wmflabs.org/v3/scores/simplewiki/?models=damaging%7Cgoodfaith&revids=3266888&precache=true&format=json

{"simplewiki": {"models": {"damaging": {"version": "0.4.0"}, "goodfaith": {"version": "0.4.0"}}, "scores": {"3266888": {"damaging": {"error": {"message": "RevisionNotFound: Could not find revision ({revision}:3266888)", "type": "RevisionNotFound"}}, "goodfaith": {"error": {"message": "RevisionNotFound: Could not find revision ({revision}:3266888)", "type": "RevisionNotFound"}}}}}}

https://simple.wikipedia.org/w/index.php?diff=3266888 doesn't exist. It's trying to score the revision *on* Simple English wiki.

Aha, thanks!

On to the next puzzle. All four thresholds were appearing yesterday, but today only one appears on Special:RecentChanges,
https://simple.wikipedia.beta.wmflabs.org/wiki/Special:RecentChanges

The API response is correct for http://ores-beta.wmflabs.org/v3/scores/simplewiki/?models=damaging&model_info=statistics.thresholds.false.%22maximum+recall+%40+precision+%3E%3D+0.995%22%7Cstatistics.thresholds.true.%22maximum+filter_rate+%40+recall+%3E%3D+0.9%22%7Cstatistics.thresholds.true.%22maximum+recall+%40+precision+%3E%3D+0.6%22%7Cstatistics.thresholds.true.%22maximum+recall+%40+precision+%3E%3D+0.9%22&format=json

{"simplewiki": {"models": {"damaging": {"statistics": {"thresholds": {"false": [{"!f1": 0.236, "!precision": 0.136, "!recall": 0.887, "accuracy": 0.804, "f1": 0.888, "filter_rate": 0.222, "fpr": 0.113, "match_rate": 0.778, "precision": 0.995, "recall": 0.801, "threshold": 0.899}], "true": [{"!f1": 0.881, "!precision": 0.996, "!recall": 0.79, "accuracy": 0.794, "f1": 0.23, "filter_rate": 0.767, "fpr": 0.21, "match_rate": 0.233, "precision": 0.132, "recall": 0.901, "threshold": 0.091}, {"!f1": 0.984, "!precision": 0.973, "!recall": 0.995, "accuracy": 0.969, "f1": 0.329, "filter_rate": 0.988, "fpr": 0.005, "match_rate": 0.012, "precision": 0.62, "recall": 0.224, "threshold": 0.769}, {"!f1": 0.983, "!precision": 0.967, "!recall": 1.0, "accuracy": 0.967, "f1": 0.061, "filter_rate": 0.999, "fpr": 0.0, "match_rate": 0.001, "precision": 0.913, "recall": 0.032, "threshold": 0.941}]}}}}}}

Cache contents show the correct values!

Right now I'm not seeing the ORES filters in RC at all on simplewiki in labs, despite https://gerrit.wikimedia.org/r/395066.

Now throwing a stack trace that no goodfaith model exists in the database...

I ran CheckModelVersions manually, which brought the database models back. I think that's not on a cronjob, so I'll add it to the "new model checklist".

Change 399464 had a related patch set uploaded (by Awight; owner: Awight):
[mediawiki/extensions/ORES@master] Don't double-quote model version

https://gerrit.wikimedia.org/r/399464

@Catrope This is unstalled and ready for testing on the beta cluster. Would you like to own the rest of the config + deployment, since you have a patch ready?

@Catrope I should have read the title of the task... this is ours for a while longer. We need to poke at the data and see if a model built for enwiki is valid on simplewiki, since it's the first time we've tried such boldness.

Oh this is done. It has been reviewed by @Adotchar

Change 399464 merged by jenkins-bot:
[mediawiki/extensions/ORES@master] Don't double-quote model version

https://gerrit.wikimedia.org/r/399464

Oh this is done. It has been reviewed by @Adotchar

@Halfak @Adotchar can either of you confirm that this task can be closed? I remember some confusion on IRC when Aaron's comment went through. I would assume that experimentation would have to be done on the beta cluster, which doesn't have the models enabled yet, so unsure how this task would be completed. I'll update the task description per my understanding, please edit if I'm off-base.

Oh this is done. It has been reviewed by @Adotchar

@Halfak @Adotchar can either of you confirm that this task can be closed? I remember some confusion on IRC when Aaron's comment went through. I would assume that experimentation would have to be done on the beta cluster, which doesn't have the models enabled yet, so unsure how this task would be completed. I'll update the task description per my understanding, please edit if I'm off-base.

I’ve been testing this for a few weeks. English models work perfectly.

awight claimed this task.

Thanks for the confirmation!

Change 395059 abandoned by Awight:
Enable ORES on simplewiki

https://gerrit.wikimedia.org/r/395059