Page MenuHomePhabricator

Write prototype MediaWiki extension to surface ORES scores
Closed, ResolvedPublic

Description

Fetch and cache scores from the ORES server, as new edits are made. Optionally display the scores in the Recent Changes feed.

This first iteration should land somewhere between barely workable and the MVP.

This is a Q2 goal for Research

Patches to review

mwext-ORES

mw-core

Event Timeline

awight created this task.Sep 17 2015, 8:26 AM
awight claimed this task.
awight raised the priority of this task from to Normal.
awight updated the task description. (Show Details)
awight added a subscriber: awight.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptSep 17 2015, 8:26 AM

Change 238825 had a related patch set uploaded (by Awight):
Build schema to store full classifier outputs; fix UI

https://gerrit.wikimedia.org/r/238825

Change 229423 had a related patch set uploaded (by Legoktm):
Initial commit

https://gerrit.wikimedia.org/r/229423

Change 238825 abandoned by Legoktm:
Build schema to store full classifier outputs; fix UI

Reason:
squashed into parent

https://gerrit.wikimedia.org/r/238825

Halfak added a project: Research.
Halfak set Security to None.
Halfak moved this task from Staged to In Progress on the Research board.
Halfak moved this task from In Progress to Radar on the Research board.
Halfak updated the task description. (Show Details)Sep 17 2015, 10:34 PM

Some IRC discussion outcomes (legoktm, awight):

We want a maintenance script that will be run whenever a new model version is released, which will take the model and version as arguments and will repopulate all revisions within the last 30 days with new scores. This should use a PoolCounter to prevent overloading the ORES server.

We also need a script that can purge anything matching a (model, version) in case we discover a bad model.

Change 239326 had a related patch set uploaded (by Awight):
Maintenance script to purge bad cached results

https://gerrit.wikimedia.org/r/239326

Just realized that we need to link ores_classification with revision rather than recentchanges, if we plan to support caching scores on page history.

Halfak added a subscriber: Halfak.Sep 25 2015, 4:20 PM

@awight, I reviewed the schema and left a note on the schema. Versions are strings. Otherwise, this looks good to me.

Change 229423 merged by jenkins-bot:
Initial commit

https://gerrit.wikimedia.org/r/229423

Halfak added a subscriber: Legoktm.Oct 16 2015, 5:48 PM

@Legoktm, I understand that the first commit was merged, but that there was more work to do to get versioning and cache invalidation working. What's your plan for that? Should we open a separate card?

Change 239327 had a related patch set uploaded (by Awight):
Encapsulate ORES fetch and store

https://gerrit.wikimedia.org/r/239327

Change 247022 had a related patch set uploaded (by Awight):
Job to download the most recent model versions

https://gerrit.wikimedia.org/r/247022

Change 247034 had a related patch set uploaded (by Awight):
Decrease brokenness

https://gerrit.wikimedia.org/r/247034

Change 239326 abandoned by Awight:
Maintenance script to purge bad cached results

Reason:
Work continued in Icaef8ae2

https://gerrit.wikimedia.org/r/239326

Not quite an MVP yet, because the Recent Changes reverted risk pills are broken when the article has only one change in the view.

Change 247038 had a related patch set uploaded (by Awight):
Show revert risk pill on ungrouped recentchanges lines

https://gerrit.wikimedia.org/r/247038

k, this is ready for review again.

awight added a comment.EditedOct 18 2015, 3:36 PM

Other features we might still need for an MVP:

  • When recent changes lines are collapsed due to multiple changes to the same article, we probably want the collapsed line to indicate when it contains a revert-risky change? I'm not sure how common this case will be.
  • Do we care about pre-populating the cache? I'm leaning towards letting it fill naturally, as new changes come in. That might look bad when we invalidate an old model version, however.
  • Cache invalidation should be triggered by CheckModelVersions. When it detects a new version has been released, that model should be invalidated.

Change 247185 had a related patch set uploaded (by Awight):
Flag reverted risk rows using the recentChangesFlag

https://gerrit.wikimedia.org/r/247185

Change 247249 had a related patch set uploaded (by Awight):
Generalize recentChangesFlags rollup

https://gerrit.wikimedia.org/r/247249

Change 239327 merged by jenkins-bot:
Encapsulate ORES fetch and store

https://gerrit.wikimedia.org/r/239327

Change 247022 merged by jenkins-bot:
Job to download the most recent model versions

https://gerrit.wikimedia.org/r/247022

Change 247034 merged by jenkins-bot:
Write purge script; decrease brokenness

https://gerrit.wikimedia.org/r/247034

Change 247038 abandoned by Awight:
Show revert risk pill on ungrouped recentchanges lines

https://gerrit.wikimedia.org/r/247038

Change 247790 had a related patch set uploaded (by Awight):
Always return true from hooks; docstrings

https://gerrit.wikimedia.org/r/247790

Change 247790 merged by jenkins-bot:
Always return true from hooks; docstrings

https://gerrit.wikimedia.org/r/247790

awight added a comment.Dec 3 2015, 8:36 AM

@Halfak
Should I be using the "damaging" model rather than "reverted"?

Change 256641 had a related patch set uploaded (by Awight):
Actually implement the ORES RC filter

https://gerrit.wikimedia.org/r/256641

Halfak added a comment.Dec 3 2015, 4:43 PM

@awight, good Q. I think that we should have prefer "damaging" over "reverted".

It would be great if we could also include "goodfaith" if available, but we can leave that to future work.

Change 256641 abandoned by Awight:
Actually implement the ORES RC filter

Reason:
squashed

https://gerrit.wikimedia.org/r/256641

awight added a comment.Dec 8 2015, 5:54 AM

It would be great if we could also include "goodfaith" if available, but we can leave that to future work.

Side note: Results from all available models will be cached by the code as it is already, and the schema is designed with multiple models in mind, so this future work is just a matter of accessing the data.

awight updated the task description. (Show Details)Dec 9 2015, 10:04 AM

@Halfak
Can you say a bit about how we'll use "goodfaith" scores in the extension? Should we be tagging RecentChanges with that information, or use in a filter? Or use it to calculate some qualification for "damaging"?

Change 257851 had a related patch set uploaded (by Awight):
Switch from the "reverted" to "damaging" model

https://gerrit.wikimedia.org/r/257851

awight updated the task description. (Show Details)Dec 9 2015, 10:33 AM

Change 257856 had a related patch set uploaded (by Awight):
Simplify down to a single threshold

https://gerrit.wikimedia.org/r/257856

awight updated the task description. (Show Details)Dec 9 2015, 10:50 AM

@awight, I'm not sure how we'll want to make use of the info. I suspect that pulling in a designer might be valuable here, but here's a few thoughts I have.

  1. just include the "goodfaith" score wherever scores are listed -- this would allow a reviewer to make judgements about how to react to vandalism when spotted. It would be great if this was related to the diff itself.
  2. allow for a specific filter on "is not goodfaith" -- this would allow a reviewer to focus on edits that are likely to be vandalism specifically.

Right now, the per-edit "goodfaith" prediction isn't as useful as it could be. I think we really want to do a different type of model (one that accounts for a set of recent edits -- profiling the editor) to flag editors as good-faith or bad-faith, but that is farther out and should probably be left for future consideration.

@violetto has agreed to take a look at Wikilabels. Maybe I can get some advice from her about this too. I'll reach out.

Change 258916 had a related patch set uploaded (by Awight):
Add an config variable to override the wiki ID

https://gerrit.wikimedia.org/r/258916

We have a working prototype now, just waiting for code review...

awight updated the task description. (Show Details)Dec 15 2015, 11:12 PM
awight updated the task description. (Show Details)

Change 247185 merged by jenkins-bot:
Flag reverted risk rows using the recentChangesFlag

https://gerrit.wikimedia.org/r/247185

Change 257851 merged by jenkins-bot:
Switch from the "reverted" to "damaging" model

https://gerrit.wikimedia.org/r/257851

Change 257856 merged by jenkins-bot:
Simplify down to a single threshold

https://gerrit.wikimedia.org/r/257856

Change 258916 merged by jenkins-bot:
Add an config variable to override the wiki ID

https://gerrit.wikimedia.org/r/258916

awight closed this task as Resolved.Dec 28 2015, 11:09 PM

Remaining work is in subtasks of T112856.

Change 247249 merged by jenkins-bot:
Generalize recentChangesFlags rollup

https://gerrit.wikimedia.org/r/247249