Page MenuHomePhabricator

Add language support for Finnish
Closed, ResolvedPublic


Event Timeline

4shadoww created this task.Feb 20 2017, 5:30 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptFeb 20 2017, 5:30 PM

Finnish does already have word list, but it's 8 months old. So will we need new word list or are we fine with the old one?

Stryn added a subscriber: Stryn.Feb 20 2017, 6:52 PM
Zache added a subscriber: Zache.Feb 22 2017, 10:20 AM
Halfak added a subscriber: Halfak.Feb 23 2017, 3:22 PM

@4shadoww, I think the old one should work. 8 months isn't too old for this kind of signal.

Halfak triaged this task as Medium priority.Feb 23 2017, 3:26 PM
Halfak moved this task from Untriaged to Research & analysis on the Machine Learning Platform board.
Halfak updated the task description. (Show Details)

I have sorted the word list. Does it look like ok?

Looks great. Are there any more words (or word variants) that would would like to add to the list before we encode it in our modeling library?

As an example, for English, we have many variants of curse words in our tests. E.g. "shit", "sh1t", "shiiit", etc.

Zache added a comment.Feb 23 2017, 7:54 PM

I added some common bad words more from fiwikis abuse filter rules. Though i think that there would be more if the more is better.

More is generally better. This isn't the last chance to extend the list though it may be the last chance to extend the list directly on the wiki. Future extensions will need to happen in code, but that isn't very difficult. See English Wikipedia's test set for the words we try to match there:

I added few words more. I think it's now ready to be encoded to modeling library.

4shadoww updated the task description. (Show Details)Feb 24 2017, 7:17 AM
Restricted Application added a project: User-Ladsgroup. · View Herald TranscriptFeb 26 2017, 12:58 AM
Ladsgroup moved this task from Review to Done on the Machine Learning Platform (Current) board.
Halfak closed this task as Resolved.Mar 16 2017, 9:21 PM