- generate TFiDF badword lists
- review and aggregation of badwords/informal words by native speaker
- implement revscoring.Language (Language utility)
you need to either download them and open it with notepad (gedit, or anything suitable) or in your browser check for encoding option (probably in view menu) and choose "UTF-8" or "Unicode"
There's nothing wrong with these files regarding encoding.
I fixed the encoding and copied it to the wiki here: https://meta.wikimedia.org/wiki/Research:Revision_scoring_as_a_service/Word_lists/ur
@MuhammadShuaib, could you take another look? We need you to move words from the "list-generated" to list-badwords(racial slurs, curse words, offensive language) and list-informals(causal talk: e.g., "hello", "haha", "lol", "wat"). Please let me know if you have any questions.
This task has been assigned to the same task owner for more than two years. Resetting task assignee due to inactivity, to decrease task cookie-licking and to get a slightly more realistic overview of plans. Please feel free to assign this task to yourself again if you still realistically work or plan to work on this task - it would be welcome!
For tips how to manage individual work in Phabricator (noisy notifications, lists of task, etc.), see https://phabricator.wikimedia.org/T228575#6237124 for available options.
(For the records, two emails were sent to assignee addresses before resetting assignees. See T228575 for more info and for potential feedback. Thanks!)