See parent task for details.
(This is last on the list because it is only used on br.wikimedia.org)
See parent task for details.
(This is last on the list because it is only used on br.wikimedia.org)
Subject | Repo | Branch | Lines +/- | |
---|---|---|---|---|
Unpack Brazilian Portuguese Analysis Chain | mediawiki/extensions/CirrusSearch | master | +129 -5 |
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Open | None | T219550 [EPIC] Harmonize language analysis across languages | |||
Resolved | Gehel | T272606 [EPIC] Unpack all Elasticsearch analyzers | |||
Resolved | TJones | T325092 Unpack Brazilian (Portuguese) Elasticsearch Analyzer | |||
Resolved | TJones | T333398 Reindex brwikimedia to use new unpacked Brazlian Portuguese analysis chain |
Change 903713 had a related patch set uploaded (by Tjones; author: Tjones):
[mediawiki/extensions/CirrusSearch@master] Unpack Brazilian Portuguese Analysis Chain
Full write up in Mediwiki.
Nothing too exciting implicit to the analyzer, but the difference between brazilian and portuguese is surprising, but unjustified by the differences in Brazilian and European Portuguese. Worth looking into. (Will open a ticket shortly.)
In a sample from the one wiki were brazilian is currently used, there is a small increase in words being analyzed the same due to ICU folding and ICU normalization.
Change 903713 merged by jenkins-bot:
[mediawiki/extensions/CirrusSearch@master] Unpack Brazilian Portuguese Analysis Chain