Page MenuHomePhabricator

[Research spike, 4 hours] Research Hebrew language analyzers
Closed, ResolvedPublic

Description

Review Hebrew Analyzers previously found and look for others. Then, we'll test the analyzers to see if they really are better.

Event Timeline

The short version: the HebMorph-based analyzer it is!

https://github.com/synhershko/elasticsearch-analysis-hebrew (5 days)

  • Based on HebMorph, linked by Elastic, available for ES5.3
  • Offers separate lemmatizer, Niqqud (diacritic) character filter (allowing for unpacking if needed), and several levels of analzyers.
  • Commercial option includes proprietary dictionary.
  • Does not have an obvious stop word list, but it'll be easy enough to tell if there is one when I do analysis later.

On stopwords:

It's pretty hard to find anything else for Hebrew...