Two ways to start:
- Languages that we really want to make big improvements on because we don't support them well (e.g. spaceless languages)
- Test analysers that we know to be very mature (e.g. there's a Polish analyser that @dcausse knows about and likes)
Things to consider:
- How much better the analyser is than what we've got
- Maintainability of the code of the analyser
- [add more!]
Languages/analyzers to consider (from T155549):
- Polish—Elastic says theirs "provides high quality stemming for Polish", and it's probably easy. (T154516 / T154517)
- Chinese—we really need this, and we know of SmartCN and others to consider. (T158202 / T158203 )
- Ukrainian—Elastic has one, though it only "provides stemming for Ukrainian" (no "high quality claim"); we're currently using Russian, which is better than nothing, but not at all great. (T160105 / T160106)
- Hebrew—Recently requested / suggested, and Elastic suggests HebMorph as well. ( T162739 / T162741 )
- Japanese—We're using CJK analysis in production, which is just bigrams. Maybe Elastic's Kuromoji is better? ( T166731 )
- Vietnamese—the only one left on the list recommended/linked-to by Elastic. (T170423)
Previously a 2016/17 Q3 Goal.
Previously a 2016/17 Q4 Goal.
Currently a 2017/18 Q1 Goal.