So when we gather 20,000 revisions we ended up with too few uncertain revisions in fawiki due to the heavier involvement of bots & good users on this wiki.
- Have a sample of something like 50,000 random revisions
- Auto label the 50,000 random sample so we are left with 50,000 - x many auto labeled revisions and x many revisions for wikilabels.
- Have a random resample of 18,000 auto labelled revisons and 2,000 for wiki labels
I want this to be handled independent of quarry because I'd rather we do not micro manage this any longer. We are getting more languages. This should however have some sort of a config file so that we can fine tune this as needed for other stuff