Change Details

# Background **Product overview document:** https://docs.google.com/document/d/1rUzRNBGKi7Vi9RS4vVXaNyNUzqc99-xvkTbmsz0FkC8/edit ---- //If we enable communities to automatically prevent or revert obvious vandalism, moderators will have more time to spend on other activities.// ---- ## Goals - Reduce moderation backlogs by preventing bad edits from entering patroller queues. - Give moderators confidence that automoderation is reliable and is not producing significant false positives. - Ensure that editors caught in a false positive have clear avenues to flag the error / have their edit reinstated. Further user stories are documented [[ https://docs.google.com/document/d/1rUzRNBGKi7Vi9RS4vVXaNyNUzqc99-xvkTbmsz0FkC8/edit#heading=h.mktmvkdvaa0n | here ]]. ## Helpful links - [[ https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing | LiftWing ]] ([[ https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Usage | Usage ]]) - Models: [[ https://meta.wikimedia.org/wiki/Machine_learning_models/Proposed/Multilingual_revert_risk | Multilingual ]] / [[ https://meta.wikimedia.org/wiki/Machine_learning_models/Proposed/Language-agnostic_revert_risk | Language-agnostic ]] ## Anti-vandalism bots | Bot | Code repository | ClueBot NG | https://github.com/cluebotng | SeroBOT | https://github.com/dennistobar/serobot | ChenzwBot | https://gitlab.com/antivandalbot-ng | Рейму Хакурей | https://github.com/Saisengen/wikibots/blob/main/other-bots/vand-rollbacker-DB.cs | PatrocleBot | https://github.com/rowiki/oresreverter # Investigation We want to investigate the technical approach we might take for Automoderator at a high level, answering questions such as: - Should this be a MediaWiki extension, or some kind of Cloud-hosted tool? - How should we approach community configuration? Will we aim to use the Growth team's Community Configuration toolset? - Are we likely to have any technical requests for the Machine Learning platform team for our use of LiftWing? - As a tool which will be actively editing Wikimedia projects, are there any development principles we can set to ensure that we minimise the introduction of breaking changes as we iterate on it? Engineers should feel free to tackle any other high-level questions they might have about our approach beyond the above. ## Findings: > Should this be a MediaWiki extension, or some kind of Cloud-hosted tool? This should be an external tool. It will will need to take each revision on a given project as input. Because of the need to effectively scale, that means subscribing to a stream of filtered events and requesting a score for each one, either via changeprop or flink. An extension would mainly be beneficial if we were using hooks to fire those requests after the creation of a revision and would come with the downsides of being tied to the train process for deployment and sharing resources with the production site. > How should we approach community configuration? Will we aim to use the Growth team's Community Configuration toolset? Since this will be off-wiki, we won't be using Community Config. Enabling the tool for a project will mean updating our filter in whatever event system we subscribe to, but we'll want to create an interface for configuring the thresholds within each active project. This could be as simple as webform with a couple of numeric input fields that only allows access to a small set of users via oauth. > Are we likely to have any technical requests for the Machine Learning platform team for our use of LiftWing? I think we will have questions come up as we implement, but not any that require special consideration; they have been very responsive to one-off queries. > As a tool which will be actively editing Wikimedia projects, are there any development principles we can set to ensure that we minimise the introduction of breaking changes as we iterate on it? Yes. Especially at the beginning, we should design to avoid false-positives even over efficacy. We should hardcode some guardrails that cannot be overriden with configuration. The guardrails should be internal implementations of whatever module/class is doing the thing that needs the guardrail. For example: - We should disallow revert thresholds below a designated "safe" value, such as 90% revert risk probability. To help protect ourselves while developing the tool, the code that actually does the reverting could be in a separate class/module internally hard codes this limit. If the guardrail is private, then it is less likely to be accidentally overriden by another class/module that calls for a revert. - If we support multiple thresholds (eg, an additional "marginal" threshold in which the tool takes a non-revert action, such as tagging or sending a notification, etc), we should not allow the thesholds to overlap Warnings/Errors should be raised when any configuration or code bumps into a guardrail (e.g if it would perform a "bad" action without the guardrail in place) The tool should be able to be disabled rapidly, and the moderator community should have the ability to use this feature in case of a spike in false-positives. We should consider the possibility of circular reverts/ revert wars, e.g., if the tool reverts a revision, and then a human overrules and reverts the tool's revert, perhaps it should not go into a loop of reverting the revert of a revert. That might be a case for the tool to take another action, such as sending a notification. Business logic should always default to no action. Pardon the pseudo-code, but as an example: do this: ``` switch ( score ) { case > 0.90: revert() break default: return false } ``` instead of: ``` switch ( score ) { case < 0.90: return false default: revert() } ```