Page MenuHomePhabricator

[Story] Allow labels with diacritics to be found when searching using plain ascii
Closed, ResolvedPublic

Description

When entering "Alasehir", items with the label "Alaşehir" should be found, without an aliases being explicitly defined. Similarly, "Munchen" should find "München". This should work at least for roman-based diacritics. Not sure if such "simplification" is applicable for other scripts.

Implementation notes:

  • a rough-and-dirty implementation can be based on converting to DNF, and then stripping all non-ascii characters. This will turn Ä into A, and so fort. It will however break non-roman scripts completely, so we need a whitelist of language codes (or character ranges).
  • Perhaps Elastic/Lucene already provides this, so we'd get it for free with Cirrus integration.
  • If we want to implement this without Currus/Elastic/Lucene, it should go into TermSqlIndex::getSearchKey()

Event Timeline

daniel raised the priority of this task from to Needs Triage.
daniel updated the task description. (Show Details)
daniel added a project: Wikidata.
daniel subscribed.
daniel set Security to None.
daniel added a subscriber: Lydia_Pintscher.
matej_suchanek renamed this task from Story: Allow labels with diacritics to be found when searching using plain ascii to [Story] Allow labels with diacritics to be found when searching using plain ascii.Jan 27 2016, 9:54 AM
matej_suchanek added a project: Story.
jhsoby claimed this task.
jhsoby subscribed.

With the new search, this is how it works, so closing.