Background
Product overview document: https://docs.google.com/document/d/1rUzRNBGKi7Vi9RS4vVXaNyNUzqc99-xvkTbmsz0FkC8/edit
If we enable communities to automatically prevent or revert obvious vandalism, moderators will have more time to spend on other activities.
Goals
- Reduce moderation backlogs by preventing bad edits from entering patroller queues.
- Give moderators confidence that automoderation is reliable and is not producing significant false positives.
- Ensure that editors caught in a false positive have clear avenues to flag the error / have their edit reinstated.
Further user stories are documented here.
Helpful links
- LiftWing (Usage)
- Models: Multilingual / Language-agnostic
Anti-vandalism bots
Bot | Code repository |
ClueBot NG | https://github.com/cluebotng |
SeroBOT | https://github.com/dennistobar/serobot |
ChenzwBot | https://gitlab.com/antivandalbot-ng |
Рейму Хакурей | https://github.com/Saisengen/wikibots/blob/main/other-bots/vand-rollbacker-DB.cs |
PatrocleBot | https://github.com/rowiki/oresreverter |
Investigation
We want to investigate the technical approach we might take for Automoderator at a high level, answering questions such as:
- Should this be a MediaWiki extension, or some kind of Cloud-hosted tool?
- the user-facing parts should be in an extension
- T349295: Determine technical approach for Automoderator edit revert component the revert component as well as the source of scores should also be in an extension
- How should we approach community configuration?
- T349374: How will communities configure Automoderator? We aim to use the Growth team's Community Configuration toolset
- Are we likely to have any technical requests for the Machine Learning platform team for our use of LiftWing?
- Not anything bigger than one-off support questions
- As a tool which will be actively editing Wikimedia projects, are there any development principles we can set to ensure that we minimise the introduction of breaking changes as we iterate on it?
- we should design to err on the side of inaction
- the revert component should have minimum probability guardrails internally hard coded to avoid misconfiguration of known unsafe thresholds
- warnings/errors should be raised when any configuration or code bumps into a guardrail (e.g if it would perform a "bad" action without the guardrail in place)
- configuration changes should be auditable (an advantage of community config)
- the tool should be able to be disabled rapidly, and the moderator community should have the ability to use this feature in case of a spike in false-positives.
- we should consider the possibility of circular reverts/ revert wars, e.g., if the tool reverts a revision, and then a human overrules and reverts the tool's revert, perhaps it should not go into a loop of reverting the revert of a revert. That might be a case for the tool to take another action, such as sending a notification. At a minimum, we should implement the same "revert war" filter used to train the model [1]:
- business logic should always default to no action.
Engineers should feel free to tackle any other high-level questions they might have about our approach beyond the above.
Findings details
As a tool which will be actively editing Wikimedia projects, are there any development principles we can set to ensure that we minimise the introduction of breaking changes as we iterate on it?
We should hardcode some guardrails that cannot be overridden with configuration. The guardrails should be internal implementations of whatever module/class is doing the thing that needs the guardrail. For example:
- We should disallow revert thresholds below a designated "safe" value, such as 90% revert risk probability. To help protect ourselves while developing the tool, the code that actually does the reverting could be in a separate class/module internally hard codes this limit. If the guardrail is private, then it is less likely to be accidentally overridden by another class/module that calls for a revert.
- If we support multiple thresholds (eg, an additional "marginal" threshold in which the tool takes a non-revert action, such as tagging or sending a notification, etc), we should not allow the thresholds to overlap
Business logic should always default to no action. Pardon the pseudo-code, but as an example:
do this:
switch ( score ) { case > 0.90: revert() break default: return false }
instead of:
switch ( score ) { case < 0.90: return false default: revert() }
Additional Info
We'll need to evaluate the UX tradeoffs of being on- or off-wiki in a little more detail:
let me loop through some user stories. Of course, these are all just my initial assessments:
As a moderator, I want to configure Automoderator with thresholds and settings that my community has agreed on, so that we feel confident it is acting in the way we want.
- onwiki vs offwiki - wash
- For discoverability, I'm not sure that there will be a big difference between links to a special page and links to an externally hosted tool. Instead of talking about this in terms of where the code is hosted, we could talk about where the entry points need to be for discoverability. Even if we have a special page on-wiki, users won't know to go to it unless it is exposed to them.
- If we determine that we really need config on-wiki, we could have the service be configurable via an api POST request and setup a form on a special page that sends that request on submit. We would need to keep those in sync.
As a moderator, I want Automoderator to only take actions on edits which it is qualified to make judgements on, so that the number of false positive reverts is minimized.
- onwiki vs offwiki - wash
- for performance reasons, some of the configured items may need to be setup in the function that filters and consumes the stream. We'll need to do another round of investigation on flink vs changeprop to see if one has an advantage there in terms of configurability, but neither one of them runs on-wiki.
As a moderator, I want to test different Automoderator settings against recent edits, so that I can understand what will happen when I save configuration changes.
- onwiki vs offwiki - offwiki
- we have some existing work on this available, and it's off wiki
As a new good faith editor, I want to know when Automoderator has reverted one of my edits and be given clear steps for reporting the false positive, so that I can have my edit reinstated.
- onwiki vs offwiki - onwiki looks like the best at first blush, but it might be down to what our moderator communities want. I think more user research is warranted.
- question: (how) are we going to handle temp/ip users?
- talk page approach:
- onwiki vs offwiki - wash
- editcheck approach:
- onwiki vs offwiki - onwiki
As a moderator, I want to review false positive reports from new editors, so that I can reinstate good edits which shouldn’t have been reverted.
- onwiki vs offwiki - onwiki
As a Wikimedia Foundation researcher, I want false positive report data to be available to me so that I can retrain the model and make it more accurate.
- onwiki vs offwiki - too early to call
- this is so wide open right now that it's too early to call the best approach.