Add features like:
- matching_lang_badwords
- english_lang_badwords
- matching_lang_dictwords
Careful to not punish langs that we don't have assets for.
Consider using known alphabet of a lang as a feature.
Add features like:
Careful to not punish langs that we don't have assets for.
Consider using known alphabet of a lang as a feature.
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | • johl | T127047 Collection of topics for HPI hackathon | |||
Resolved | Lydia_Pintscher | T127473 Increase signal of feature set for Wikidata model | |||
Resolved | Halfak | T171505 Late-July 2017 ORES deploy | |||
Resolved | Ladsgroup | T162617 Use 'informals', 'badwords', etc. in Wikidata feature set | |||
Resolved | Ladsgroup | T170834 Add basic bad word check to Wikidata feature set | |||
Resolved | Ladsgroup | T170835 Add entropy-related and uppercase-related measures to comments |
See https://github.com/wiki-ai/bwds/blob/master/dump_based_detection.py#L71 for known alphabets
Yes we are. When people edit Wikidata using GUI, it adds what they changed as edit summary. See https://www.wikidata.org/w/index.php?title=Special:RecentChanges&hidenondamaging=1 for example.
Oh! I see! Your approach is a very interesting solution. I'm OK with calling this done :)