Review Ukrainian Analyzers previously found and look for others. Then, we'll test the analyzers to see if they really are better.
Description
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Invalid | None | T174065 [FY 2017-18 Objective] Improve support for searching in multiple languages | |||
Open | None | T154511 [Tracking] Research, test, and deploy new language analyzers | |||
Resolved | TJones | T160105 [Research spike, 4 hours] Research Ukrainian language analyzers | |||
Resolved | TJones | T160106 Test and analyze new Ukrainian language analyzers | |||
Resolved | EBernhardson | T162055 Deploy New Ukrainian Analyzer & Re-index Ukrainian Wikis |
Event Timeline
Ukrainian
https://www.elastic.co/guide/en/elasticsearch/plugins/5.1/analysis-ukrainian.html (5.1.2)
https://www.elastic.co/guide/en/elasticsearch/plugins/current/analysis-ukrainian.html (5.2)
Elastic-supported plugin, based on Morfologik.
https://github.com/vhyza/elasticsearch-analysis-lemmagen (1 month)
https://bitbucket.org/hlavki/jlemmagen (2014)
LemmaGen, lemmatization for Ukrainian +14 others, in Java
https://www.linkedin.com/pulse/efficient-search-your-local-language-roman-ora%C4%8D (2016 )
Blog post on using LemmaGen (for Slovene)
Ukrainian files claim to be "free", but I didn't find specific licensing info; based on link from bitbucket.org to Multext-East: http://nl.ijs.si/ME/V4/
https://github.com/mrgambal/elasticsearch-ukrainian-lemmatizer (10 months)
Only up to ES 2.2.1; MIT license;
Based on https://issues.apache.org/jira/browse/LUCENE-7287 , this is what became the ES 5 plugin.
https://github.com/vgrichina/elasticsearch-ukrainian-stemmer (4 years)
no license; very old
There's not a bunch out there, and the ES Morfologik plugin seems to be the most popular by far, and would be the easiest to support, so my current plan is to test that, and if it is good, run with it. If not, we can look back here for other options.