Page MenuHomePhabricator

Create Croatian, Serbo-Croatian, and Bosnian Analysis Chains Using Serbian Morphological Libraries
Closed, ResolvedPublic


It should be possible to deploy the Serbian analysis chain with the Serbian stemmer (or some variant) to the Serbo-Croatian (sh/442K Wikipedia articles), Croatian (hr/185K Wikipedia articles), and Bosnian (bs/77K Wikipedia articles) wikis. (See the English Wikipedia article on Serbo-Croatian; the standard varieties of the language differ, but have the same basic grammar, so the stemmer should do as good a job on them as on Serbian.)

The task here is to review the results of the analysis chain and make sure there aren't any surprises. Then, we can re-index wikis in these three languages.

Event Timeline

TJones triaged this task as Medium priority.Apr 17 2018, 6:29 PM
TJones created this task.

Everything looks good to me with the Serbian analyzer and ICU folding enabled, but need speaker review. If everything looks good to them, then I'll deploy the analysis chain and then re-index the relevant wikis.

Full write up is on Mediawiki.

Speaker feedback is positive, and we are good to go on that front. However, we're refactoring the plugin that provides the analysis, so we're going to finish that and get the Serbian wikis all sorted out (T193734) before enabling this for Croatian, Serbo-Croatian, and Bosnian.

And I just realized that we moved from waiting on speaker feedback to waiting on plugin refactoring, so back to Waiting status for now!

Change 437510 had a related patch set uploaded (by Tjones; owner: Tjones):
[mediawiki/extensions/CirrusSearch@master] Create BSC Analysis Chains using Serbian Stemmer

Change 437510 merged by jenkins-bot:
[mediawiki/extensions/CirrusSearch@master] Create BSC Analysis Chains using Serbian Stemmer