[Epic, Q1 Goal] Research, test, and deploy new language analyzers
Open, NormalPublic

Description

Two ways to start:

  • Languages that we really want to make big improvements on because we don't support them well (e.g. spaceless languages)
  • Test analysers that we know to be very mature (e.g. there's a Polish analyser that @dcausse knows about and likes)

Things to consider:

  • How much better the analyser is than what we've got
  • Maintainability of the code of the analyser
  • [add more!]

Languages/analyzers to consider (from T155549):

Previously a 2016/17 Q3 Goal.
Previously a 2016/17 Q4 Goal.
Currently a 2017/18 Q1 Goal.

Related Objects

StatusAssignedTask
OpenNone
OpenNone
ResolvedTJones
ResolvedTJones
ResolvedEBernhardson
ResolvedTJones
ResolvedTJones
ResolvedTJones
ResolvedTJones
ResolvedNone
ResolvedTJones
ResolvedTJones
ResolvedTJones
ResolvedEBernhardson
ResolvedTJones
ResolvedTJones
ResolvedGehel
Resolveddcausse
Resolveddebt
ResolvedTJones
DeclinedTJones
ResolvedTJones
OpenTJones
Deskana created this task.Jan 3 2017, 7:43 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJan 3 2017, 7:43 PM
Deskana renamed this task from [EPIC] Research, test, and deploy new language analysers to [Epic, Q3 Goal] Research, test, and deploy new language analysers.Jan 3 2017, 7:44 PM
Deskana triaged this task as Normal priority.
Deskana moved this task from Needs triage to Current work on the Discovery-Search board.
Deskana added a project: Epic.
TJones added a subscriber: TJones.Jan 11 2017, 6:04 PM
This comment was removed by TJones.
TJones updated the task description. (Show Details)Jan 11 2017, 6:40 PM

HebMorph was recommended by @Matanya. It was investigated some time ago by Matanya and Nik (@Manybubbles). It's being actively developed and Matanya knows the developer.

TJones updated the task description. (Show Details)Jan 24 2017, 9:18 PM
Restricted Application added a subscriber: Base. · View Herald TranscriptJan 24 2017, 9:18 PM
TJones added a comment.EditedJan 26 2017, 5:29 PM

While researching analyzers, I came across others. I didn't really investigate most of them, so this list is just a starting point for anyone who wants to look more closely at any of these.

General
https://www.elastic.co/guide/en/elasticsearch/plugins/5.1/analysis.html (ES 5.1)
list of Elastic Analysis Plugins (internal and 3rd party)—Japanese, several for Chinese, Polish, Ukrainian, Hebrew, Russian, English, Vietnamese, & some technical ones.

Polish
See T154516.

Chinese
See T158202.

Ukrainian
See T160105.

Hebrew
See T162739.

Japanese
https://www.elastic.co/guide/en/elasticsearch/plugins/5.1/analysis-kuromoji.html (v5.1.2)
https://www.elastic.co/guide/en/elasticsearch/plugins/master/analysis-kuromoji.html (v6.0.0a)
test here (v?): http://www.atilika.org/

Vietnamese
https://github.com/duydo/elasticsearch-analysis-vietnamese (3 months)
linked by Elastic

Thai
https://www.elastic.co/guide/en/elasticsearch/plugins/current/analysis-icu.html
ICU Anlaysis plugin, "including better analysis of Asian languages"
Mentioned elsewhere that it covers Thai as well.

Phonetic analysis
https://www.elastic.co/guide/en/elasticsearch/plugins/current/analysis-phonetic.html (v5.1.2)
https://www.elastic.co/guide/en/elasticsearch/plugins/master/analysis-phonetic.html (v6.0.0.a)
“Soundex, Metaphone, and a variety of other algorithms”, presumably English

Misc
https://github.com/yakaz/elasticsearch-analysis-combo (2 years)
combines multiple language analyzers

TJones updated the task description. (Show Details)Feb 14 2017, 9:56 PM
TJones updated the task description. (Show Details)Feb 15 2017, 3:54 PM
mxn added a subscriber: mxn.Apr 11 2017, 7:15 AM
TJones updated the task description. (Show Details)Apr 11 2017, 7:46 PM
TJones renamed this task from [Epic, Q3 Goal] Research, test, and deploy new language analysers to [Epic, Q3 Goal, Q4 Goal] Research, test, and deploy new language analysers.Apr 11 2017, 9:01 PM
TJones updated the task description. (Show Details)Jun 23 2017, 6:41 PM
TJones renamed this task from [Epic, Q3 Goal, Q4 Goal] Research, test, and deploy new language analysers to [Epic, Q1 Goal] Research, test, and deploy new language analyzers.Jul 12 2017, 1:25 PM
TJones updated the task description. (Show Details)
TJones updated the task description. (Show Details)Jul 21 2017, 8:32 PM