Page MenuHomePhabricator

Unpack Brazilian (Portuguese) Elasticsearch Analyzer
Closed, ResolvedPublic2 Estimated Story Points

Description

See parent task for details.

(This is last on the list because it is only used on br.wikimedia.org)

Event Timeline

TJones triaged this task as Medium priority.Dec 13 2022, 7:00 PM
TJones set the point value for this task to 2.
TJones moved this task from needs triage to Language Stuff on the Discovery-Search board.
TJones raised the priority of this task from Medium to High.Mar 6 2023, 6:26 PM

Change 903713 had a related patch set uploaded (by Tjones; author: Tjones):

[mediawiki/extensions/CirrusSearch@master] Unpack Brazilian Portuguese Analysis Chain

https://gerrit.wikimedia.org/r/903713

Full write up in Mediwiki.

Nothing too exciting implicit to the analyzer, but the difference between brazilian and portuguese is surprising, but unjustified by the differences in Brazilian and European Portuguese. Worth looking into. (Will open a ticket shortly.)

In a sample from the one wiki were brazilian is currently used, there is a small increase in words being analyzed the same due to ICU folding and ICU normalization.

Change 903713 merged by jenkins-bot:

[mediawiki/extensions/CirrusSearch@master] Unpack Brazilian Portuguese Analysis Chain

https://gerrit.wikimedia.org/r/903713