- Run Bad-Words-Detection-System to get potential badword list
- Human review of BWDS list
- Integrate into revscoring
Description
Event Timeline
Tagalog Wikipedia (and other sister Tagalog wikis) are always hit by vandalism, and it would be a great idea if the Tagalog community would have revision scoring to combat vandalism. I am a native speaker of Tagalog (and Filipino), so I would help in the human review.
I notified the Tagalog Wikipedia community about this, which can be found at https://tl.wikipedia.org/wiki/Usapang_Wikipedia:Kapihan#Revision_scoring.
@Ladsgroup Hello, can you run your bot to generate a list of bad words from tlwiki, tlwikibooks, and tlwiktionary? I will help in the human review after the generation of the list. Thanks.
I'm not sure if I did already or not (don't have access right now) but I'll check and run if needed ASAP
Okay, I started the job in tools:
tools.dexbot@tools-bastion-03:~/pywikibot-core$ jsub -once -N tl_bwds -mem 7g -l release=trusty /data/project/dexbot/pywikibot-core/p3_2/bin/python /data/project/dexbot/pywikibot-core/pwb.py /data/project/dexbot/pywikibot-core/scripts/dump_based_detection_beta.py /public/dumps/public/tlwiki/20161001/tlwiki-20161001-pages-meta-history.xml.bz2 Your job 49938 ("tl_bwds") has been submitted
I'm guessing it'll be done within the next 24 hours and then you can check it out in https://meta.wikimedia.org/wiki/Research:Revision_scoring_as_a_service/Word_lists/tl
It is done now. We need a native speaker to review it based on this manual and then re-assign this task to me so I start integrating Tagalog language in revscoring.
The previous comments don't explain who or what (task?) exactly this task is stalled on ("If a report is waiting for further input (e.g. from its reporter or a third party) and can currently not be acted on"). Hence resetting task status, as tasks should not be stalled (and then potentially forgotten) for years for unclear reasons.