Page MenuHomePhabricator

Reindex Czech, Finnish, Galician wikis to enable unpacked versions
Closed, ResolvedPublic3 Estimated Story Points

Description

Once T284578 is deployed (probably in MediaWiki_1.37/wmf.22), we can reindex the relevant wikis, to activate ICU normalization, ICU folding, and homoglyph normalization.

Current counts are: Czech (8 wikis), Finnish (10 wikis), Galician (5 wikis)

Acceptance Criteria

  • All wikis in the relevant languages are reindexed
  • A before-and-after analysis for each language's Wikipedia is provided

Event Timeline

TJones created this task.
TJones set the point value for this task to 3.

The train moved backwards, so it's not time to reindex.

The Czech and Finnish Wikipedia samples showed clear but rather muted impact on user query results. The Galician results are a little more robust and show a more consistent pattern of searchers not using standard accents (rather than just problems with "foreign" diacritics).

Full write up on MediaWiki.

@MPhamWMF, if you got what you want/need from the report, this is ready to be moved to "Needs Reporting"