Maniphest T294067

Install and unpack Bengali analyzer
Closed, ResolvedPublic8 Estimated Story Points
Actions

Assigned To

Authored By

	TJones
	Oct 21 2021, 9:43 PM

Tags

Referenced Files

None

Subscribers

Description

User Story: As a user of Bengali-language wikis, I want to have better Bengali language analysis so I see better search results (particularly, better recall).

Elasticsearch provides a Bengali language analyzer, but we don't currently use it for Bengali-language projects. We should enable it, have the performance verified by speakers, and then unpack it.

Acceptance Criteria:

Bengali speakers verify reasonable performance of the stemmer
Unpacked analyzer performs the same as the monolithic version (without general upgrades).
Upgraded analyzer either has no unexpected impact (we know what to expect from ICU norm and homoglyph norm, for example), or the impact is reviewed by a speaker of the language.
Analysis changes are deployed, a re-indexing sub-task is created off this task's parent (T272606), and linked to in T147505.

Details

	Subject	Repo	Branch	Lines +/-
	Enable, Unpack, & Customize Bengali Analyzer	mediawiki/extensions/CirrusSearch	master	+469 -129

Customize query in gerrit

Related Objects
Search...

		Status	Subtype	Assigned	Task
		Open		None	T219550 [EPIC] Harmonize language analysis across languages
		Resolved		Gehel	T272606 [EPIC] Unpack all Elasticsearch analyzers
		Resolved		TJones	T294067 Install and unpack Bengali analyzer
		Resolved		TJones	T315265 Reindex Bengali wikis to enable new analyzer

Event Timeline

TJones created this task.Oct 21 2021, 9:43 PM

TJones mentioned this in T272606: [EPIC] Unpack all Elasticsearch analyzers.

• MPhamWMF set the point value for this task to 8.Oct 25 2021, 3:39 PM

• MPhamWMF moved this task from Incoming to Ready for Dev -- SWE on the Discovery-Search (Current work) board.

TJones claimed this task.Nov 8 2021, 4:14 PM

TJones moved this task from Ready for Dev -- SWE to In Progress on the Discovery-Search (Current work) board.

TJones moved this task from In Progress to Waiting on the Discovery-Search (Current work) board.Feb 8 2022, 9:02 PM

TJones moved this task from Waiting to In Progress on the Discovery-Search (Current work) board.Feb 16 2022, 8:23 PM

Change 816876 had a related patch set uploaded (by Tjones; author: Tjones):

[mediawiki/extensions/CirrusSearch@master] Enable, Unpack, & Customize Bengali Analyzer

https://gerrit.wikimedia.org/r/816876

gerritbot added a project: Patch-For-Review.Jul 26 2022, 1:04 AM

Change 816876 merged by jenkins-bot:

[mediawiki/extensions/CirrusSearch@master] Enable, Unpack, & Customize Bengali Analyzer

https://gerrit.wikimedia.org/r/816876

Maintenance_bot removed a project: Patch-For-Review.Jul 26 2022, 5:30 PM

• dcausse moved this task from In Progress to To Be Deployed on the Discovery-Search (Current work) board.Aug 1 2022, 3:13 PM

• dcausse moved this task from To Be Deployed to In Progress on the Discovery-Search (Current work) board.

Write up is complete (though the code has been merged for a while): Bengali enabling/unpacking notes.

Reindexing task: T315265: Reindex Bengali wikis to enable new analyzer

Gehel closed this task as Resolved.Aug 29 2022, 2:32 PM

TJones added a subtask: T315265: Reindex Bengali wikis to enable new analyzer.May 1 2023, 6:07 PM