In T346916 we generated datasets which we made available through a testing process to judge the accuracy of the Language Agnostic Revert Risk model. We would like to investigate moving to the Multilingual Revert Risk model, which will require a new round of testing. We want to know if it's reliable, how different it is to the Language Agnostic version, and set sensible thresholds to use in Automoderator's community configuration.
To start, please generate datasets for the same wikis as in T346916 so that we can make these datasets available to the community. We can then create a second version of the testing spreadsheet incorporating these datasets.
We can start with 2,000 edits per wiki - 25,000 as in the original ticket turned out to be much more than we needed!