This task involves the work for determining what languages the first version of Peacock Check will support. This will in turn affect the wikis we approach to be partners for this work (T387921).
Selection criteria
We are seeking languages/wikis where:
- The model – as currently conceived – will perform with high enough precision for volunteers seeing Peacock Check(s) to consider them reliable/useful
- Newcomers and junior contributors publish new content containing peacock language at a high enough rate for experienced volunteers to perceive this as an important issue to address
- Training data is accessible enough for evaluating the model to be relatively straightforward
Selection process
- Verify cost of gathering training/evaluation data for languages previous BERT model considered (see "3." above),
- Evaluate model performance on languages for which training/evaluation data is relatively low-cost/effort, and see which languages are probably launch-ready vs. which languages would require us to update the model
- For each of the languages that would require an update to the model, get a sense for how often peacock edits get reverted. This will help us prioritize whether to update the model, or just launch with the launch-ready languages