To support new languages, we need language assets. To complete this task, we need to build up a list of "stopwords", "informals", and "badwords".
- stopwords: Words that glue sentences together but contain little information (e.g., in English "a", "the", "because", "for", etc. are all stopwords)
- badwords: Racial slurs, Curse words, and otherwise offensive language
- informals: Informal language that is not offensive but doesn't belong in a Wikipedia article (e.g. in English "yolo", "momma", "hello", and "bye" are all informal language)
See https://github.com/wikimedia/revscoring/tree/master/tests/languages for a list of languages that are already supported.