Page MenuHomePhabricator

Accents are not ignored by autocompletion in fr.wiktionary
Closed, ResolvedPublic

Details

Reference
bz70561

Event Timeline

bzimport raised the priority of this task from to Needs Triage.Nov 22 2014, 3:42 AM
bzimport added a project: CirrusSearch.
bzimport set Reference to bz70561.
bzimport added a subscriber: Unknown Object (MLST).
Automatik created this task.Sep 8 2014, 3:36 PM

Its a reasonably easy thing to turn on but right now its only on in English. We can turn it on if its the right thing.

Another thing - the behavior if I turn on accent ignoring is wider then just autocomplete - its in search as well. When that is done perfect accent matching pulls the result higher in search but accent mismatching results still show up.

ever->already*

Hmmm...... I'll investigate that - since no one has complained about that behavior I imagine its correct or at least ok. Either way - if prefix search should have it I'll file this bug and see about getting it in there. Won't be super soon - but I'll get to it.

I'm not sure what you mean by "no one complained" but I opened this bug after this point: https://fr.wiktionary.org/wiki/Wiktionnaire:Wikid%C3%A9mie/septembre_2014#A_propos_du_moteur_de_recherche

Sorry, I mean search flattening the accents didn't receive any complaints when we turned CirrusSearch on for frwiktionary a few months ago. At least I don't think anyone did.

Anyway - I'll have a look at turning on accent squashing for frwiktionary soon.

gerritadmin wrote:

Change 160990 had a related patch set uploaded by Manybubbles:
Add asciifolding to some French analyzers

https://gerrit.wikimedia.org/r/160990

I've added a proposal to flatten all accented characters into non-accented ones for prefix search and exact title matches. It'll require rebuilding the index but that is no big deal.

Note: I found out where the other normalization comes from. The French stemmer we use for inexact matches performs the following mappings:
'à', 'á', 'â' -> 'a'
'ô' -> 'o'
'è', 'é', 'ê' -> 'e'
'ù', 'û' -> 'u'
'î' -> 'i'
'ç' -> 'c'

I could, if you believe it is more correct, only perform those mappings for the prefix and exact title matching.

I can't be sure it's more correct, could you tell me what is done for fr.wikipedia please? Accents are flatened for this site too.

gerritadmin wrote:

Change 160990 merged by jenkins-bot:
Add asciifolding to some French analyzers

https://gerrit.wikimedia.org/r/160990

All patches mentioned in this report were merged or abandoned - is there more work left to do here (if yes: please reset the bug report status to NEW or ASSIGNED), or can you close this ticket as RESOLVED FIXED?