Page MenuHomePhabricator

Add support for a new language in ORES
Closed, InvalidPublic

Description

To support new languages, we need language assets. To complete this task, we need to build up a list of "stopwords", "informals", and "badwords".

  • stopwords: Words that glue sentences together but contain little information (e.g., in English "a", "the", "because", "for", etc. are all stopwords)
  • badwords: Racial slurs, Curse words, and otherwise offensive language
  • informals: Informal language that is not offensive but doesn't belong in a Wikipedia article (e.g. in English "yolo", "momma", "hello", and "bye" are all informal language)

See https://github.com/wikimedia/revscoring/tree/master/tests/languages for a list of languages that are already supported.