Based on research in T171652, look at the following in more detail as possible candidates for creating Elasticsearch language analyzer plugins.
Description
Description
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Invalid | None | T174065 [FY 2017-18 Objective] Improve support for searching in multiple languages | |||
Open | None | T154511 [Tracking] Research, test, and deploy new language analyzers | |||
Resolved | TJones | T171652 Language Analysis Morphological Library Research Spike | |||
Open | None | T178923 Review Japanese Morphological Libraries |
Event Timeline
Comment Actions
We've moved on to other tasks and aren't spending time looking at morphological libraries these days.
However, I may spend some 10% time reviewing the Kuromoji analyzer for Japanese that is endorsed by Elasticsearch, which we previously decided not to use. There may have been improvements since I last looked at it, and after working on Nori for Korean I have a few new insights into sources of trouble and how to find them (and create custom filters and/or open upstream bugs to fix them), so maybe Kuromoji can pass muster. It would be a lot less work to deploy than finding a third-party analyzer, porting it to an Elasticsearch plugin, and maintaining it.