- Run Bad-Words-Detection-System to get potential badword list
- Human review of BWDS list
- Integrate into revscoring
Description
Details
Subject | Repo | Branch | Lines +/- | |
---|---|---|---|---|
ores: install aspell-sv | operations/puppet | production | +1 -1 |
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | Ladsgroup | T131450 Gather language assets for Swedish | |||
Resolved | Ladsgroup | T135604 Train a `reverted` model for svwiki |
Event Timeline
@Josve05a It would be great if you review the list generated by bot and divide them into three lists:
- Words that are not acceptable to use anywhere in Wikipedia like 'f**ck, 'sh*t'. I understand these words are okay to use in article related to the subject but that's okay since we are considering proportion of added/changed words not total number of them
- Words that are not okay in Wikipedia articles but it's okay to use them in talk pages like 'Hey', 'LOL', etc.
- Words that are none of the above so false positives picked by bot.
Put it somewhere and we will do the rest.
Thanks
It's great! thank you. Please keep in mind that you should not sort "generated common words" only "generated words". I'm saying this since you're list contains 379 "Unknown" words but all of generated bad words are 250 words so my guess is you are also working on generated common words too.
I've gone through the rest of them, sorted them and left comments on a group of words that are often used in vandalism but I guess would have more common legitimate uses than most of the words on the list. @Josve05a , feel free to through my edits and see if there's anything you disagree with – some of the ones left weren't exactly obvious.
Change 289162 had a related patch set uploaded (by Ladsgroup):
ores: install aspell-sv