Technically this model is language agnostic, but it does require some statistical values for every wiki in order to calculate quality features:
- avg article length
- avg number of media
- avg number of categories
- avg number of headings
- avg number of wikilinks
- avg number of references
This task involves adding these values for new languages and updating the model binary to accurately reflect the total number of supported wikis.
[] Add quality feature values for 35 new languages to [[ https://gitlab.wikimedia.org/repos/research/knowledge_integrity/-/blob/4ad8fb32cdde8767e6e5a84865b130fc49484398/knowledge_integrity/constants.py#L285 | constants.py ]] using this [[ https://gitlab.wikimedia.org/repos/research/knowledge_integrity/-/blob/309920e8ce16622ec46ee6eade8da74235c4a5c4/RRR/constants_article_quality_fill_min.csv | new file ]] created by @diego (See the commit message for more details on how default values were generated for wikis)
[] Update the `supported_wikis` attribute on the serialized `RevertRiskModel`
[] Bump model version from 1.0 to 2.0
[] Test the new model binary and pass it on to ML