To complement our unsupervised strategy for learning clusters of damaging/not-damaging edits, we should also experiment with looking for bias in places we think are likely.
In order to do this, I image us withholding a random sample during revscoring train_test and testing the fitness of the model on interesting sub-sets. E.g. edits by newcomers, edits by anonymous users, etc.
We'll likely want to automate this because training and testing a model takes a long time. We'll also need to create a utility for running a new test-set through a pre-trained model.
This card is done when we have a plan for implementing a manual bias detection strategy and the specific tasks have been created.