Add support to English and/or Russian Wikipedia for detecting and converting queries typed in one language on the other language's keyboard.
Examples:
- пукьфт сгшышту (Russian phonetic transliteration: "puk'ft sgshyshtu") looks like gibberish; converting from Russian to American keyboard gives german cuisine.
- qatktdf ,fiyz looks like gibberish, but converting from American to Russian keyboard gives эйфелева башня, "Eiffel Tower".
More details and examples are here.
We can use TextCat language detection to detect these tranliteratable gibberish strings.
Additional requirements for successful implementation include (but are not limited to):
- possibly more data analysis limited to poorly performing queries (the analysis above is on all queries, and so overestimates the cost).
- more complex interaction with language detections, including paying attention to "second place" language results, filtering results after language detection (see notes with more details and examples above), and having differing behaviors for different languages (i.e., showing cross-wiki results for some languages, doing query re-writes or did you mean suggestions for other languages).
- coming up with a mechanism for dealing with multiple suggestions (e.g., this plus a spelling correction); possibilities include some sort of confidence score from each suggester, hard-coded ordering, or a nice display of multiple suggestions.
Related: