Page MenuHomePhabricator

Install and unpack Bengali analyzer
Closed, ResolvedPublic8 Estimated Story Points

Description

User Story: As a user of Bengali-language wikis, I want to have better Bengali language analysis so I see better search results (particularly, better recall).

Elasticsearch provides a Bengali language analyzer, but we don't currently use it for Bengali-language projects. We should enable it, have the performance verified by speakers, and then unpack it.

Acceptance Criteria:

  • Bengali speakers verify reasonable performance of the stemmer
  • Unpacked analyzer performs the same as the monolithic version (without general upgrades).
  • Upgraded analyzer either has no unexpected impact (we know what to expect from ICU norm and homoglyph norm, for example), or the impact is reviewed by a speaker of the language.
  • Analysis changes are deployed, a re-indexing sub-task is created off this task's parent (T272606), and linked to in T147505.

Event Timeline

Change 816876 had a related patch set uploaded (by Tjones; author: Tjones):

[mediawiki/extensions/CirrusSearch@master] Enable, Unpack, & Customize Bengali Analyzer

https://gerrit.wikimedia.org/r/816876

Change 816876 merged by jenkins-bot:

[mediawiki/extensions/CirrusSearch@master] Enable, Unpack, & Customize Bengali Analyzer

https://gerrit.wikimedia.org/r/816876