Page MenuHomePhabricator

Add ICU_folding filter for EN, FR and EL wiki projects
Closed, ResolvedPublic

Description

Along with the work we're doing on T137830 and T102298 (but slightly different than what T41501 details), we'll want to add ICU-folding (configuration flag) in for the English and French wiki projects.

This will involve re-running a few tests to be sure we're not breaking anything.

Event Timeline

debt triaged this task as Medium priority.Sep 22 2016, 6:34 PM

Looks like we should turn this on! My analysis is here.

A lot of pronunciations for words get mapped onto those words now—that's cool!

A problem has been exposed: rarely, "modifier letter apostrophe" is used instead of a straight quote or curly right quote. Before ICU folding they were being indexed wrong; with this patch they will be indexed wrong, but in a different way. I've opened T146804 to map them, but it may not get done before the re-index happens.

I think this was and should be assigned to @dcausse . I probably should have created a separate task, but all I did was evaluate the effect of turning it on locally and analyzing big blobs of text. David knows how to configure it properly for deployment. (Unless someone wants me to do it.)

Change 313838 had a related patch set uploaded (by DCausse):
Enable ICU folding for en, fr and greek by default

https://gerrit.wikimedia.org/r/313838

Added greek to the list, I thought we agreed to enable it on fulltext search as well, currently it's only enabled on greek wikipedia with the completion suggester.

dcausse renamed this task from Add ICU_folding filter for EN and FR wiki projects to Add ICU_folding filter for EN, FR and EL wiki projects.Oct 11 2016, 5:37 PM

Change 313838 merged by jenkins-bot:
Enable ICU folding for en, fr and greek by default

https://gerrit.wikimedia.org/r/313838