Maniphest T190815

Create Slovak Elasticsearch Plugin/Analysis Chain Using Slovak Stemming Algorithm
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	TJones
	Mar 27 2018, 2:34 PM

Description

I'm still waiting to see if I get any more feedback on the Slovak stemming algorithm, but it seems clear that the "light" stemmer does a good job, and stripping the naj- prefix is helpful. Some implementation details could change (also stripping pod- or changing the order of stripping prefixes to be before or after the other inflectional suffixes), but the basic set up is clear.

So, the next step is add Elasticsearch plugin based on the stemmer to search/extra (which is license-compatible) and test that out as part of an analysis chain.

Details

	Subject	Repo	Branch	Lines +/-
	Add documentation for slovak_stemmer	search/extra	master	+26 -1
	Create Slovak Elasticsearch Plugin	search/extra	master	+488 -0

Customize query in gerrit

Related Objects
Search...

Status	Assigned	Task
Invalid	None	T174065 [FY 2017-18 Objective] Improve support for searching in multiple languages
Open	None	T154511 [Tracking] Research, test, and deploy new language analyzers
Resolved	TJones	T171652 Language Analysis Morphological Library Research Spike
Resolved	TJones	T178929 Review Slovak Morphological Libraries
Resolved	TJones	T190815 Create Slovak Elasticsearch Plugin/Analysis Chain Using Slovak Stemming Algorithm
Resolved	Gehel	T191543 Deploy updated search/extra plugin and search/extra-analysis-slovak plugin with Slovak Stemmer
Resolved	TJones	T191544 Deploy the analysis config for the new Slovak stemmer
Resolved	TJones	T191545 Re-index Slovak Wikis after analysis chain is deployed

Event Timeline

TJones triaged this task as Medium priority.Mar 27 2018, 2:34 PM

TJones created this task.

TJones moved this task from Incoming to not in use - please delete on the Discovery-Search (Current work) board.

Change 423043 had a related patch set uploaded (by Tjones; owner: Tjones):
[search/extra@master] [WIP] Create Slovak Elasticsearch Plugin

https://gerrit.wikimedia.org/r/423043

gerritbot added a project: Patch-For-Review.Mar 29 2018, 9:06 PM

The update to the extra/search plugin above is a work in progress because it does not yet contain unit tests. However, I was able to use the plugin to test the full analysis chain. The write up is on MediaWiki. The key points:

Elasticsearch stemmer behaved exactly like the command line stemmer.
Adding ICU folding, with exceptions for Slovak characters, looks to be a net positive.
Next steps: deploy the updated search/extra plugin (when ready), deploy the analyzer config after the plugin is deployed, re-index Slovak-language wikis.

Change 423043 merged by jenkins-bot:
[search/extra@master] Create Slovak Elasticsearch Plugin

https://gerrit.wikimedia.org/r/423043

TJones moved this task from not in use - please delete to Needs review on the Discovery-Search (Current work) board.Apr 5 2018, 3:32 PM

TJones moved this task from Needs review to Needs Reporting on the Discovery-Search (Current work) board.

TJones mentioned this in T191544: Deploy the analysis config for the new Slovak stemmer.Apr 5 2018, 4:36 PM

Change 428395 had a related patch set uploaded (by Tjones; owner: Tjones):
[search/extra@master] Add documentation for slovak_stemmer

https://gerrit.wikimedia.org/r/428395

Change 428395 merged by jenkins-bot:
[search/extra@master] Add documentation for slovak_stemmer

https://gerrit.wikimedia.org/r/428395

debt closed this task as Resolved.May 1 2018, 6:16 PM

debt closed subtask T191543: Deploy updated search/extra plugin and search/extra-analysis-slovak plugin with Slovak Stemmer as Resolved.Jun 1 2018, 1:56 PM

Create Slovak Elasticsearch Plugin/Analysis Chain Using Slovak Stemming AlgorithmClosed, ResolvedPublicActions

Description

Details

Related ObjectsSearch...

Event Timeline

Create Slovak Elasticsearch Plugin/Analysis Chain Using Slovak Stemming Algorithm
Closed, ResolvedPublic
Actions

Related Objects
Search...