Page MenuHomePhabricator

Add language support for Korean
Closed, ResolvedPublic

Description

Event Timeline

revi renamed this task from Add language support for ... to Add language support for Korean.Mar 17 2017, 3:06 PM

@revi put together this list based on some other sources: P5073

(Just FYI:) P5072 was added few minutes before halfak made 5073, and 5072 is the authoritative list.

@revi, I almost pulled this to our main workboard, but I realized that we still need a list of "informals". @Ladsgroup said that he's updated https://meta.wikimedia.org/wiki/Research:Revision_scoring_as_a_service/Word_lists/ko with a new run of BWDS. Could you have a look at it to see if it is any more useful.

Alternatively, you could help us build a list of informals from your own knowledge. See the English informals for a large set of examples of the kind of thing we're looking for. https://github.com/wiki-ai/revscoring/blob/master/revscoring/languages/tests/test_english.py#L87

Unfortunately I have to say updated version of BWDS run is still meaningless except one entry.

Also, informals list is what I was going to work on tomorrow.

Gotcha. Sounds good. Sorry for the BWDS issues for Korean. I've been working on that a lot in the last week.

Halfak triaged this task as High priority.Mar 23 2017, 2:53 PM
Halfak moved this task from Unsorted to New development on the Machine-Learning-Team board.

I know the list is broad, but paragraph ending with the following words are almost likely to be informal and not encyclopedic, so P5122 is the list. (The list is quite small, so I'll need to adjust it quite often.)

1
2
3```
4>>> import enchant
5>>> ko = enchant.Dict('ko')
6>>> ko.check("foo")
7True
8>>> ko.check("foo asndals")
9False
10>>> ko.check("fooasndals")
11True
12>>> ko.check("fooasndals;sfdfaslnflasndlas")
13False
14>>> ko.check("fooasndalssfdfaslnflasndlas")
15True
16```