- Run Bad-Words-Detection-System to get potential badword list
- Human review of BWDS list
- Integrate into revscoring
Description
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | Catrope | T170723 Deploy ORES Review Tool & ORES-based RCFilters for Romanian & Albanian Wikipedia | |||
Resolved | awight | T170485 ORES deployment - Mid July, 2017 | |||
Resolved | Halfak | T170490 Train reverted model for Bengali Wikipedia | |||
Resolved | Ladsgroup | T162620 Add language support for Bengali | |||
Resolved | Halfak | T164767 Fix bengali tokenization in deltas |
Event Timeline
Started the bot to analyze Bengali Wikipedia:
tools.dexbot@tools-bastion-03:~/pywikibot-core$ jsub -once -N bn_bwds -mem 7g -l release=trusty /data/project/dexbot/pywikibot-core/p3_2/bin/python /data/project/dexbot/pywikibot-core/pwb.py /data/project/dexbot/pywikibot-core/scripts/dump_based_detection_beta.py /public/dumps/public/bnwiki/20170301/bnwiki-20170301-pages-meta-history.xml.bz2 Your job 3887310 ("bn_bwds") has been submitted
It will be in https://meta.wikimedia.org/wiki/Research:Revision_scoring_as_a_service/Word_lists/bn in several hours to review.
Hey @Aftabuzzaman, it looks like our proposed lists are ready. https://meta.wikimedia.org/wiki/Research:Revision_scoring_as_a_service/Word_lists/bn
See https://meta.wikimedia.org/wiki/Objective_Revision_Evaluation_Service/BWDS_review for instructions about what we need you to do next.
https://github.com/wiki-ai/revscoring/pull/309
The badword detection doesn't work the way I expect.
I think we need to work on https://github.com/halfak/deltas/blob/master/deltas/tokenizers/wikitext_split.py#L53 in order to fix this.
This has been fixed in deltas. Please update deltas dependency to 0.4.6 and try again.