Page MenuHomePhabricator

Improve Czech Language assets
Open, LowestPublic

Description

https://github.com/wikimedia/revscoring/blob/master/revscoring/languages/czech.py contains our current list of "Stop words" (words that carry little meaning but glue sentences together), "Badwords" (racial slurs and curse words), and "Informals" (casual language that doesn't belong in articles). Let's review it and make it better.

To get started, modify the tests first. See https://github.com/wikimedia/revscoring/blob/master/tests/languages/test_czech.py

Event Timeline

Halfak created this task.May 15 2019, 2:30 PM
Harej triaged this task as Lowest priority.Jun 4 2019, 9:25 PM
Harej moved this task from Untriaged to Blocked on community input on the Scoring-platform-team board.