Page MenuHomePhabricator

Implement regex-based badwords detector
Closed, ResolvedPublic

Description

Should be more powerful than the stemmer-matching strategy we are using now.

Event Timeline

Halfak raised the priority of this task from to Needs Triage.
Halfak updated the task description. (Show Details)
Halfak moved this task to Parked on the Machine-Learning-Team (Active Tasks) board.
Halfak subscribed.

@ToAruShiroiNeko noted that:

There appears to be a badword list on tr.wikipedia I was unaware of. We should exploit this resource. I don't think we can handle regexes yet.

https://tr.wikipedia.org/wiki/Kullan%C4%B1c%C4%B1:Manco_Capac/badwords

Halfak triaged this task as Medium priority.Jun 24 2015, 9:56 PM
Halfak set Security to None.

I see freeform badwords regex lists, and not just stemmer strategies. Is this task done?