Page MenuHomePhabricator

Investigate nlwiki 'reverted' model seems broken (always ~0.89 for anonymous edits)
Closed, ResolvedPublic

Description

RTRC currently only uses the damaging model to annotate the feed of contributions.

However since many wikis only have the reverted model I considered trying that when the 'damaging' model isn't available for the current wiki. However It seems something is broken since it is almost always returning with a true probability of over 0.80 for any revision by an anonymous user that is not a page creation.

Here is a query for the 75 most recent changes anonymous edits (from https://nl.wikipedia.org/w/api.php?rcdir=older&rcprop=flags%7Cids&rcshow=!bot%7Canon&rclimit=75&rctype=edit&format=json&action=query&list=recentchanges)

https://ores.wmflabs.org/scores/nlwiki/?models=reverted&revids=46277117%7C46277092%7C46277090%7C46277084%7C46277077%7C46277064%7C46277052%7C46277051%7C46277043%7C46277026%7C46277024%7C46277023%7C46277021%7C46277019%7C46277017%7C46277013%7C46276963%7C46276849%7C46276844%7C46276832%7C46276815%7C46276799%7C46276769%7C46276768%7C46276758%7C46276741%7C46276737%7C46276728%7C46276717%7C46276691%7C46276674%7C46276667%7C46276665%7C46276664%7C46276659%7C46276657%7C46276651%7C46276646%7C46276636%7C46276632%7C46276628%7C46276624%7C46276622%7C46276613%7C46276611%7C46276569%7C46276566%7C46276561%7C46276529%7C46276524%7C46276510%7C46276502%7C46276501%7C46276494%7C46276489%7C46276484%7C46276480%7C46276477%7C46276475%7C46276473%7C46276466%7C46276460%7C46276451%7C46276450%7C46276447%7C46276440%7C46276432%7C46276430%7C46276426%7C46276394%7C46276347%7C46276342%7C46276336%7C46276321%7C46276315

Each and every one of them is scored as approximately reverted.probability.false ~ 0.11 and reverted.probability.true ~ 0.89.

Event Timeline

Thanks for flagging this. We'll have a look.

So, I did a little bit of digging. It looks like our linear SVC model has this problem -- even when I trained it with a newer set of features. It looks like the model really "wants" to learn that anons are saving edits that will need to be reverted. But it looks like we are getting more nuance from a GradientBoosting model.

$ revscoring score models/nlwiki.reverted.linear_svc_balanced.model --host https://nl.wikipedia.org 46292902
46292902        {"probability": {"false": 0.11384271293716051, "true": 0.8861572870628395}, "prediction": true}

$ revscoring score models/nlwiki.reverted.gradient_boosting.model --host https://nl.wikipedia.org 46292902
46292902        {"probability": {"false": 0.17746548295142595, "true": 0.822534517048574}, "prediction": true}

$ revscoring score models/nlwiki.reverted.linear_svc_balanced.model --host https://nl.wikipedia.org 46292903
46292903        {"probability": {"false": 0.11977673360400609, "true": 0.8802232663959937}, "prediction": true}

$ revscoring score models/nlwiki.reverted.gradient_boosting.model --host https://nl.wikipedia.org 46292903
46292903        {"probability": {"false": 0.2760729420643149, "true": 0.7239270579356851}, "prediction": true}

$ revscoring score models/nlwiki.reverted.linear_svc_balanced.model --host https://nl.wikipedia.org 46292891
46292891        {"probability": {"false": 0.1287924209736749, "true": 0.8712075790263251}, "prediction": true}

$ revscoring score models/nlwiki.reverted.gradient_boosting.model --host https://nl.wikipedia.org 46292891
46292891        {"probability": {"false": 0.4032444215147887, "true": 0.5967555784852113}, "prediction": true}

So I think the moral of the story is that we should switch to the GradientBoosting model as soon as possible. Stay tuned.

Halfak renamed this task from ORES nlwiki 'reverted' model seems broken (always ~0.89 for anonymous edits) to Investigate nlwiki 'reverted' model seems broken (always ~0.89 for anonymous edits).Mar 13 2016, 4:33 PM
Halfak moved this task from Parked to Backlog on the Machine-Learning-Team (Active Tasks) board.

The GradientBoosting model is now deployed. I expect the model to perform better based on the fitness statistics. Let me know how it works in practice.

The amount of coloured new edits (using a script that uses ores) has significantly decreased, so it seems fixed.