Review Japanese Morphological Libraries
Open, MediumPublic
Actions

Assigned To

None

Authored By

	TJones
	Oct 24 2017, 4:47 PM

Description

Based on research in T171652, look at the following in more detail as possible candidates for creating Elasticsearch language analyzer plugins.

MeCab https://github.com/taku910/mecab/tree/master/mecab
tinysegmenter https://pypi.python.org/pypi/tinysegmenter
CaboCha http://taku910.github.io/cabocha/

Related Objects
Search...

Status	Assigned	Task
Invalid	None	T174065 [FY 2017-18 Objective] Improve support for searching in multiple languages
Open	None	T154511 [Tracking] Research, test, and deploy new language analyzers
Resolved	TJones	T171652 Language Analysis Morphological Library Research Spike
Open	None	T178923 Review Japanese Morphological Libraries

Event Timeline

TJones created this task.Oct 24 2017, 4:47 PM

TJones mentioned this in T171652: Language Analysis Morphological Library Research Spike.Oct 24 2017, 4:52 PM

TJones edited projects, added Discovery-Search; removed Discovery-Search (Current work).Oct 24 2017, 5:15 PM

TJones moved this task from needs triage to Up Next on the Discovery-Search board.

Liuxinyu970226 added a subscriber: Rxy.Mar 9 2018, 10:14 AM

TJones moved this task from Up Next to search-icebox on the Discovery-Search board.Nov 13 2018, 6:33 PM

We've moved on to other tasks and aren't spending time looking at morphological libraries these days.

However, I may spend some 10% time reviewing the Kuromoji analyzer for Japanese that is endorsed by Elasticsearch, which we previously decided not to use. There may have been improvements since I last looked at it, and after working on Nori for Korean I have a few new insights into sources of trouble and how to find them (and create custom filters and/or open upstream bugs to fix them), so maybe Kuromoji can pass muster. It would be a lot less work to deploy than finding a third-party analyzer, porting it to an Elasticsearch plugin, and maintaining it.

TJones removed TJones as the assignee of this task.Nov 13 2018, 6:51 PM

TJones moved this task from search-icebox to Language Stuff on the Discovery-Search board.Jan 29 2019, 7:16 PM

Liuxinyu970226 added subscribers: Shirayuki, whym.Feb 22 2019, 5:10 AM

TJones mentioned this in T317476: Filter and sort search results of Japanese kana search queries in accordance with how much of the query appears as a consecutive substring.Sep 12 2022, 6:38 PM

Review Japanese Morphological LibrariesOpen, MediumPublicActions

Description

Related ObjectsSearch...

Event Timeline

Review Japanese Morphological Libraries
Open, MediumPublic
Actions

Related Objects
Search...