Page MenuHomePhabricator

Determine high-level technical approach for Automoderator
Closed, ResolvedPublic

Description

Background

Product overview document: https://docs.google.com/document/d/1rUzRNBGKi7Vi9RS4vVXaNyNUzqc99-xvkTbmsz0FkC8/edit


If we enable communities to automatically prevent or revert obvious vandalism, moderators will have more time to spend on other activities.


Goals

  • Reduce moderation backlogs by preventing bad edits from entering patroller queues.
  • Give moderators confidence that automoderation is reliable and is not producing significant false positives.
  • Ensure that editors caught in a false positive have clear avenues to flag the error / have their edit reinstated.

Further user stories are documented here.

Helpful links

Anti-vandalism bots

Investigation

We want to investigate the technical approach we might take for Automoderator at a high level, answering questions such as:

  • Should this be a MediaWiki extension, or some kind of Cloud-hosted tool?
  • How should we approach community configuration?
  • Are we likely to have any technical requests for the Machine Learning platform team for our use of LiftWing?
    • Not anything bigger than one-off support questions
  • As a tool which will be actively editing Wikimedia projects, are there any development principles we can set to ensure that we minimise the introduction of breaking changes as we iterate on it?
    • we should design to err on the side of inaction
    • the revert component should have minimum probability guardrails internally hard coded to avoid misconfiguration of known unsafe thresholds
    • warnings/errors should be raised when any configuration or code bumps into a guardrail (e.g if it would perform a "bad" action without the guardrail in place)
    • configuration changes should be auditable (an advantage of community config)
    • the tool should be able to be disabled rapidly, and the moderator community should have the ability to use this feature in case of a spike in false-positives.
    • we should consider the possibility of circular reverts/ revert wars, e.g., if the tool reverts a revision, and then a human overrules and reverts the tool's revert, perhaps it should not go into a loop of reverting the revert of a revert. That might be a case for the tool to take another action, such as sending a notification. At a minimum, we should implement the same "revert war" filter used to train the model [1]:
      • image.png (938×845 px, 258 KB)
    • business logic should always default to no action.

Engineers should feel free to tackle any other high-level questions they might have about our approach beyond the above.

Findings details

As a tool which will be actively editing Wikimedia projects, are there any development principles we can set to ensure that we minimise the introduction of breaking changes as we iterate on it?

We should hardcode some guardrails that cannot be overridden with configuration. The guardrails should be internal implementations of whatever module/class is doing the thing that needs the guardrail. For example:

  • We should disallow revert thresholds below a designated "safe" value, such as 90% revert risk probability. To help protect ourselves while developing the tool, the code that actually does the reverting could be in a separate class/module internally hard codes this limit. If the guardrail is private, then it is less likely to be accidentally overridden by another class/module that calls for a revert.
  • If we support multiple thresholds (eg, an additional "marginal" threshold in which the tool takes a non-revert action, such as tagging or sending a notification, etc), we should not allow the thresholds to overlap

Business logic should always default to no action. Pardon the pseudo-code, but as an example:
do this:

switch ( score ) {
  case > 0.90:
    revert()
    break
  default:
    return false
}

instead of:

switch ( score ) {
  case < 0.90:
    return false
  default:
    revert()
}

Additional Info

We'll need to evaluate the UX tradeoffs of being on- or off-wiki in a little more detail:

let me loop through some user stories. Of course, these are all just my initial assessments:

As a moderator, I want to configure Automoderator with thresholds and settings that my community has agreed on, so that we feel confident it is acting in the way we want.

  • onwiki vs offwiki - wash
  • For discoverability, I'm not sure that there will be a big difference between links to a special page and links to an externally hosted tool. Instead of talking about this in terms of where the code is hosted, we could talk about where the entry points need to be for discoverability. Even if we have a special page on-wiki, users won't know to go to it unless it is exposed to them.
  • If we determine that we really need config on-wiki, we could have the service be configurable via an api POST request and setup a form on a special page that sends that request on submit. We would need to keep those in sync.

As a moderator, I want Automoderator to only take actions on edits which it is qualified to make judgements on, so that the number of false positive reverts is minimized.

  • onwiki vs offwiki - wash
  • for performance reasons, some of the configured items may need to be setup in the function that filters and consumes the stream. We'll need to do another round of investigation on flink vs changeprop to see if one has an advantage there in terms of configurability, but neither one of them runs on-wiki.

As a moderator, I want to test different Automoderator settings against recent edits, so that I can understand what will happen when I save configuration changes.

  • onwiki vs offwiki - offwiki
  • we have some existing work on this available, and it's off wiki

As a new good faith editor, I want to know when Automoderator has reverted one of my edits and be given clear steps for reporting the false positive, so that I can have my edit reinstated.

  • onwiki vs offwiki - onwiki looks like the best at first blush, but it might be down to what our moderator communities want. I think more user research is warranted.
  • question: (how) are we going to handle temp/ip users?
  • talk page approach:
    • onwiki vs offwiki - wash
  • editcheck approach:
    • onwiki vs offwiki - onwiki

As a moderator, I want to review false positive reports from new editors, so that I can reinstate good edits which shouldn’t have been reverted.

  • onwiki vs offwiki - onwiki

As a Wikimedia Foundation researcher, I want false positive report data to be available to me so that I can retrain the model and make it more accurate.

  • onwiki vs offwiki - too early to call
  • this is so wide open right now that it's too early to call the best approach.

References

  1. Fair multilingual vandalism detection system for Wikipedia (arxiv.org)

Event Timeline

Added some code repositories for volunteer anti-vandalism bots that I could find, though wasn't able to locate repositories for others.

The linked Google Docs document isn't publicly viewable. Presumably it should be.

The linked Google Docs document isn't publicly viewable. Presumably it should be.

The document is just our internal product overview document, and was only written with the product team as the intended audience. I could make it visible but unfortunately Google Docs doesn't have a good way to set up different permissions levels for different groups (e.g. comment access for WMF staff and view access for anyone with the link - right now I'd have to give comment access to each staff member individually). All that said, there's nothing particularly interesting in the doc that isn't already documented at https://www.mediawiki.org/wiki/Moderator_Tools/Automoderator - it mostly links to other internal documentation, spreadsheets, and explorations.

jsn.sherman changed the task status from Open to In Progress.Oct 11 2023, 5:00 PM
jsn.sherman updated the task description. (Show Details)

Thanks for spending time on this @jsn.sherman!

Should this be a MediaWiki extension, or some kind of Cloud-hosted tool?

This should be an external tool.

I'd like to know more about this since this decision isn't purely a technical one. We've learned that the myriad of off-wiki tooling is hard for communities to discover - an on-wiki tool would be much easier to find and start using. We'll need to evaluate the UX tradeoffs of being on- or off-wiki in a little more detail, but I'd find it useful if you could delve a little deeper into the pros and cons of these options: what led you to feel that an external tool would be the best option? How much worse of a situation in terms of timeline or effort would it be for us to build an extension?

I'd find it useful if you could delve a little deeper into the pros and cons of these options:

I erred on the side of succinctness, but I can see that I should have shown my work better:

what led you to feel that an external tool would be the best option?

  • Running the tool outside of mediawiki would let us setup a robust container-based service that is decoupled from mediawiki and consumes streams of revisions.
    • We've done something similar in cloud services before, but there are questions about how the service should be run:
      • can we run it on toolforge (which runs lots of tools, but does tend to have some downtime) or on the production kube cluster? (which runs other non-mediawiki production services such as eventbus, parsoid, etc) either one would help minimize our maintenance load
      • if we run in a cloud services project, how are we going to maintain the service with two engineers?
  • There isn't really a good way to run a service within an extension so far as I can see. The onWiki options for doing things with reverts would be:
    • using the RevisionFromEditComplete hook to make a non-blocking function that calls the revert risk model on the completed revision and processes it. This is hanging a lot of network-bound code off of a page request, and is asking for problems, IMO.
    • polling the api via a job and then calling the revert risk model and processing. This would be slower than streaming and I think is still ultimately tied to page requests because of how jobs run. Even more network traffic than the above option.

How much worse of a situation in terms of timeline or effort would it be for us to build an extension?

With the core functionality of reverting edits, it's not about timelines; shoehorning something that merits a long-running service into a function that gets repeatedly triggered by page requests is not a good design and we shouldn't do it.
*however*, I see no reason to avoid an extension where it serves a purpose; I could see using an extension for notifications of reverts and reporting; it would also require less setup for i18n for any user-facing stuff we put in there.

We'll need to evaluate the UX tradeoffs of being on- or off-wiki in a little more detail:

let me loop through some user stories. Of course, these are all just my initial assessments:

As a moderator, I want to configure Automoderator with thresholds and settings that my community has agreed on, so that we feel confident it is acting in the way we want.

  • onwiki vs offwiki - wash
  • For discoverability, I'm not sure that there will be a big difference between links to a special page and links to an externally hosted tool. Instead of talking about this in terms of where the code is hosted, we could talk about where the entry points need to be for discoverability. Even if we have a special page on-wiki, users won't know to go to it unless it is exposed to them.
  • If we determine that we really need config on-wiki, we could have the service be configurable via an api POST request and setup a form on a special page that sends that request on submit. We would need to keep those in sync.

As a moderator, I want Automoderator to only take actions on edits which it is qualified to make judgements on, so that the number of false positive reverts is minimized.

  • onwiki vs offwiki - wash
  • for performance reasons, some of the configured items may need to be setup in the function that filters and consumes the stream. We'll need to do another round of investigation on flink vs changeprop to see if one has an advantage there in terms of configurability, but neither one of them runs on-wiki.

As a moderator, I want to test different Automoderator settings against recent edits, so that I can understand what will happen when I save configuration changes.

  • onwiki vs offwiki - offwiki
  • we have some existing work on this available, and it's off wiki

As a new good faith editor, I want to know when Automoderator has reverted one of my edits and be given clear steps for reporting the false positive, so that I can have my edit reinstated.

  • onwiki vs offwiki - onwiki looks like the best at first blush, but it might be down to what our moderator communities want. I think more user research is warranted.
  • question: (how) are we going to handle temp/ip users?
  • talk page approach:
    • onwiki vs offwiki - wash
  • editcheck approach:
    • onwiki vs offwiki - onwiki

As a moderator, I want to review false positive reports from new editors, so that I can reinstate good edits which shouldn’t have been reverted.

  • onwiki vs offwiki - onwiki

As a Wikimedia Foundation researcher, I want false positive report data to be available to me so that I can retrain the model and make it more accurate.

  • onwiki vs offwiki - too early to call
  • this is so wide open right now that it's too early to call the best approach.

I'd find it useful if you could delve a little deeper into the pros and cons of these options:

I erred on the side of succinctness, but I can see that I should have shown my work better: [snip]

One story missing here is 3rd parties. The uptake for 3rd parties of tools on Toolforge or Wmflabs is virtually 0 if not 0. Is that what you mean when you say external service? If you would like this tool to be useful to/see uptake in the wider ecosystem, it needs to be part of MediaWiki.

I also anecdotally happen to agree that tools elsewhere are indeed less discoverable than onwiki.

IMO the functionalities that you'd definitely want to put in an extension are:

  • add a change tag
  • notification (by default new users don't get notified about reverts, and even if we change that, you'd probably want to use an Automoderator-specific notification that makes it clear to the user that this was an automated revert that might be wrong, and gives a hint on how to dispute it)
  • edit summary (especially on multilingual wikis like Commons, there is no reason not to make use of translatable edit summaries)
  • maybe customize the diff UI for Automoderator reverts to add some extra information for debugging and disputing
  • maybe bot-flag vandalism to make it less disruptive? this is something that MediaWiki's revert action can do (but reverts have not-very-useful edit summaries so you probably don't want to use reverts directly)
  • community configuration, eventually (it makes sense to not block the initial version on it) - an audit trail with notifications is important when for the configuration of such a tool, and there is no point in reimplementing that. Also, external tools tend to die after a few years, and it's good if the logs don't die with them.

For onwiki vs. offwiki, I think from a technical perspective onwiki is less ideal but doable (it could use a DeferredUpdate triggered from RevisionFromEditComplete) but it's more burden on the developers (MediaWiki is complex, you are bound to the train for code updates, debugging is harder etc). The converse is that MediaWiki-based tools have a much better track record of saying functional for a long time, after the person or team who created them moves on, because they are written in a framework that most WMF and volunteer developers are familiar with, and the code is more discoverable. My assumption is that this is not as important for Automoderator because it's not the kind of thing that can do its thing forever without tuning, so once it gets abandoned it should probably get disabled soon anyway.

The tricky part, then, is how to pass information between the tool and the extension. There are a few ways:

  • create a dedicated API, which either incorporates the revert functionality, or is called after the normal edit API call and passed a revision ID
  • abuse the edit summary (since you'll probably use autosummaries, this is not as terrible as it sounds)
  • implement something like the visualeditoredit API's plugins field for the core edit API, so tools can pass along structured edit data and various extensions can act on it. IMO this would be nice, there are other things outside Automoderator it could be used for.

maybe bot-flag vandalism to make it less disruptive? this is something that MediaWiki's revert action can do (but reverts have not-very-useful edit summaries so you probably don't want to use reverts directly)

More like the opposite: ensure the autoreverter edits are visible and are easily accessible to human patrollers. Especially if this is not configurable, we should err on the side of caution.

I'd find it useful if you could delve a little deeper into the pros and cons of these options:

I erred on the side of succinctness, but I can see that I should have shown my work better: [snip]

One story missing here is 3rd parties. The uptake for 3rd parties of tools on Toolforge or Wmflabs is virtually 0 if not 0. Is that what you mean when you say external service? If you would like this tool to be useful to/see uptake in the wider ecosystem, it needs to be part of MediaWiki.

Thanks for raising this. The model that we're using is trained on Wikipedia specifically so as I understand it is unlikely to be usable by third parties.

Thanks for raising this. The model that we're using is trained on Wikipedia specifically so as I understand it is unlikely to be usable by third parties.

Other models will be needed just within the WMF fleet...? I expect the models for Wikidata and Wikiversity (as examples) will need to look quite different. Wikidata because of the content model difference in the main space and Wikiversity because they allow different kinds of content. Commons because their vandalism comes in the flavor of uploading inappropriate images...

And for other languages....?

Thanks for raising this. The model that we're using is trained on Wikipedia specifically so as I understand it is unlikely to be usable by third parties.

Other models will be needed just within the WMF fleet...? I expect the models for Wikidata and Wikiversity (as examples) will need to look quite different. Wikidata because of the content model difference in the main space and Wikiversity because they allow different kinds of content. Commons because their vandalism comes in the flavor of uploading inappropriate images...

And for other languages....?

I can speak to the languages part: we'll be using the language agnostic revert model. It looks at the revision metadata to score the revert probability. It looks at how many edits have been made to the page, how much content was added/removed, how many edits the user has made, and the user's rights.

*edited to add*

It also includes article quality features, though these are considered language-agnostic by the ML folks. We're outside my expertise at this point:
https://meta.wikimedia.org/wiki/Machine_learning_models/Proposed/Language-agnostic_revert_risk#Model

Thanks for raising this. The model that we're using is trained on Wikipedia specifically so as I understand it is unlikely to be usable by third parties.

Other models will be needed just within the WMF fleet...? I expect the models for Wikidata and Wikiversity (as examples) will need to look quite different. Wikidata because of the content model difference in the main space and Wikiversity because they allow different kinds of content. Commons because their vandalism comes in the flavor of uploading inappropriate images...

You're right - we may need to engineer this in a way that the model being used can change based on context or configuration. To your original point though - we don't plan to prioritise support for third party MediaWiki installations. We're really focused on delivering tools to the Wikimedia community, so our solutions are going to focus on problems in this context.

Thanks for raising this. The model that we're using is trained on Wikipedia specifically so as I understand it is unlikely to be usable by third parties.

Other models will be needed just within the WMF fleet...? I expect the models for Wikidata and Wikiversity (as examples) will need to look quite different. Wikidata because of the content model difference in the main space and Wikiversity because they allow different kinds of content. Commons because their vandalism comes in the flavor of uploading inappropriate images...

You're right - we may need to engineer this in a way that the model being used can change based on context or configuration. To your original point though - we don't plan to prioritise support for third party MediaWiki installations. We're really focused on delivering tools to the Wikimedia community, so our solutions are going to focus on problems in this context.

To followup on this a bit:

There are a few technical limitations that mean that our scope is even more narrow than that, at least initially. Our initial roll out is focused on WMF-hosted articles specifically. I don't think this model would work on any non-article content. We can see a future in which additional models are added to support automoderation of other kinds of content, meaning any implementation should support using other models and applying different actions. We talked about this within the team, and it should be noted here too.

All software projects come with numerous tradeoffs; let's talk about some of them, so that they aren't implicit:

  • by placing the code that actually does reverts in a system that is decoupled from our mediawiki release process, we can lower the friction for software development participation from volunteers, but we do run some long-term operations risks, as mentioned by @Tgr
  • by setting up a event-stream-based system to request scores from the model, we gain scalability/operations advantages for running this at volume, but we are raising the bar of entry for non-wikimedia projects, since they would need to run more infrastructure to use the tool themselves.

Also, my use of the phrase "external service" was too vague. External in relation to what? In this case, I meant external to the mediawiki service running on the production kubernetes cluster. In this sense, PHP code firing a post revision creation hook in an extension would be "internal", and pretty much anything else would be "external", e.g. things running:

Each of these options comes with its own significant trade-offs in terms of friction for change vs long term operational stability.

jsn.sherman updated the task description. (Show Details)
jsn.sherman updated the task description. (Show Details)

I removed some of the Q/A about the revert component for legibility; I'll be following up in T349295: Determine technical approach for Automoderator edit revert component

@jsn.sherman I noticed that some of the technical decisions taken in this task are limiting the number of revisions to be analyzed:

  • the inability of the models to score the first revision in a page,
  • using the scores saved locally in MediaWiki, which at least in the case of ores might be missing or be different from the score returned by the API
  • skipping revert wars

Is there any study on the percentage of edits this tool will evaluate?

@jsn.sherman I noticed that some of the technical decisions taken in this task are limiting the number of revisions to be analyzed:

  • the inability of the models to score the first revision in a page,
  • using the scores saved locally in MediaWiki, which at least in the case of ores might be missing or be different from the score returned by the API
  • skipping revert wars

Is there any study on the percentage of edits this tool will evaluate?

This is something I'll put on our roadmap to look into - thanks for the suggestion.

jsn.sherman updated the task description. (Show Details)

@jsn.sherman I noticed that some of the technical decisions taken in this task are limiting the number of revisions to be analyzed:

  • the inability of the models to score the first revision in a page,
  • using the scores saved locally in MediaWiki, which at least in the case of ores might be missing or be different from the score returned by the API
  • skipping revert wars

Is there any study on the percentage of edits this tool will evaluate?

Please see T352026#9480565 for an answer! :)