Page MenuHomePhabricator

Search engine should take into account a wider class of frequent misspellings to provide better suggestions
Open, MediumPublic

Description

Context

As part of a discussion about integration of some rare word within the French Wiktionary, it was pointed out that currently searching with the default internal engine of the Wiktionnaire for a term like mqmqn will return no result.

Popular general purpose web search engine will rightfully suggest "did you mean “maman?”" Indeed, in this case, it's obvious for any knowledgeable person that someone most likely on a qwerty keyboard layout like if it was an azerty one.

Desired behaviour

The minimum improvement would be that the internal search engine could provide a good suggestion for cases like this one. I'm not aware of the actual algorithms behind the search engine, but the Levenshtein distance (LD) between mqmqnand maman is only 2. It should certainly be taken into account rather than providing not even a single result. Compare for example with how searching for mamqn can suggest something like brandwerend maken, which has a 15 LD with the provided input.

It would be even better, if it was possible to feed the engine with a list of common misspells with a comment on the causes of such a misspell. Such a facility could provide both exhaustive and comprehensive specification lists, that is both something as mqnqn -> maman : "…qwerty [on] azerty…"and p.p. -> papa : "The community decided to abuse the regexp facility to suggest papa as result to pépé, pipi, popo, pypy and so." – although this later example would be a defective use of the feature. The result could then generate search result page with a leading text such as Did you mean “[[maman]]”? This is a common misspell of a French word resulting from typing the word on a [[w:qwerty|]] keyboard layout as if it was an [[w:azerty| ]] one.

Event Timeline

Removing Discovery-ARCHIVED as it is up to each team what they would like to see on their workboard. Adding CirrusSearch (which might add tags again via Herald though.)

MPhamWMF moved this task from needs triage to Language Stuff on the Discovery-Search board.