- Run Bad-Words-Detection-System to get potential badword list (See T160752: Korean generated word lists are in chinese)
- Human review of BWDS list
- Integrate into revscoring
Description
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | Ladsgroup | T161621 Deploy ORES Review Tool for hewiki | |||
Resolved | Halfak | T160638 Deploy ORES late march | |||
Resolved | Halfak | T161616 Train/test reverted model for kowiki | |||
Resolved | Halfak | T160757 Add language support for Korean |
Event Timeline
(Just FYI:) P5072 was added few minutes before halfak made 5073, and 5072 is the authoritative list.
@revi, I almost pulled this to our main workboard, but I realized that we still need a list of "informals". @Ladsgroup said that he's updated https://meta.wikimedia.org/wiki/Research:Revision_scoring_as_a_service/Word_lists/ko with a new run of BWDS. Could you have a look at it to see if it is any more useful.
Alternatively, you could help us build a list of informals from your own knowledge. See the English informals for a large set of examples of the kind of thing we're looking for. https://github.com/wiki-ai/revscoring/blob/master/revscoring/languages/tests/test_english.py#L87
Unfortunately I have to say updated version of BWDS run is still meaningless except one entry.
Also, informals list is what I was going to work on tomorrow.
Gotcha. Sounds good. Sorry for the BWDS issues for Korean. I've been working on that a lot in the last week.
I know the list is broad, but paragraph ending with the following words are almost likely to be informal and not encyclopedic, so P5122 is the list. (The list is quite small, so I'll need to adjust it quite often.)