We've pretty much made it through the list of analyzers recommended by or at least pointed to by Elastic as part of {T154511} (other than deploying HebMorph—it's coming, really!)
Several of the analyzer plugins are mainly wrappers around some other third-party open-source morphological library. So, maybe it wouldn't be that hard to wrap a plugin around another existing open-source morphological library. Of course, it would depend on details of the library: grammatical completeness, programming language, code maturity, how well maintained it is, etc.
Below I've put together a heuristically sorted list of languages currently without language-specific analyzers. //<nerd>I ranked by [[ https://en.wikipedia.org/wiki/List_of_Wikipedias#Detailed_list | number of articles ]] in the respective Wikipedias (W) and ranked by [[ http://discovery.wmflabs.org/metrics/#langproj_breakdown | volume of search requests ]] (S). The final ranking is (300-W)*(300-S)^1.5</nerd>// This ranking that takes into account Wikipedia size and search volume, with a higher weight on search volume. The article-count outliers Cebuano and Waray are still on the list, but very much farther down than where article count alone would place them. It's probably a good thing that the two recently abandoned analyzers, Japanese and Vietnamese, are at the top of the list, because it at least hints that our old list and new list mesh reasonably well.
The goal of the research spike would be to time box an investigation of these (say, two days) in order to try to answer these questions:
* Are they actually using the default non-language-specific analyzer? (probably, but if not, document!)
* Do open-source elasticsearch plugins exist for these languages? (probably not, but if so, get excited!)
* Do other open-source morphological libraries for these languages exist? (if so, document them here!)
An initial research spike would probably not be enough time to evaluate all of them (unless things go very poorly), but would give a sense of what's out there and wether it's worth it to continue with this line of investigation.
Depending on what exists, how mature the code and coverage is, and other factors, it might be worthwhile to spin off separate tasks to more deeply assess particular morphological libraries, to try to wrap them into Elasticsearch plugins, or to encourage volunteers to do so, etc.
A few additional notes:
* There are certainly other approaches that might make sense on a case by case basis. For example, for particularly similar languages it may be possible and even easier to adapt an existing morphological library from one language to the other. Indonesian to Malay might be a candidate, for example.
* There are also possibly varieties listed or not listed here that should be considered together, like maybe Serbian, Croatian, and Serbo-Croatian.
The top 50 languages on my list are:
* Japanese (ja)
* Vietnamese (vi)
* Korean (ko)
* Serbian (sr)
* Malay (ms)
* Estonian (et)
* Slovak (sk)
* Tagalog (tl)
* Tamil (ta)
* Croatian (hr)
* Serbo-Croatian (sh)
* Belarusian (be)
* Georgian (ka)
* Azerbaijani (az)
* Kazakh (kk)
* Urdu (ur)
* Latin (la)
* Esperanto (eo)
* Malayalam (ml)
* Telugu (te)
* Bengali (bn)
* Cebuano (ceb)
* Uzbek (uz)
* Albanian (sq)
* Bosnian (bs)
* Marathi (mr)
* Macedonian (mk)
* Cantonese (zh-yue)
* Afrikaans (af)
* Welsh (cy)
* Gujarati (gu)
* Burmese (my)
* Kannada (kn)
* Breton (br)
* Icelandic (is)
* Sinhalese (si)
* Swahili (sw)
* Tatar (tt)
* Tajik (tg)
* Kurdish (Kurmanji) (ku)
* Mongolian (mn)
* Luxembourgish (lb)
* Scots (sco)
* Eastern Punjabi (pa)
* Nepali (ne)
* Egyptian Arabic (arz)
* Sicilian (scn)
* Occitan (oc)
* Waray (war)
* Asturian (ast)