Page MenuHomePhabricator

Create a Latin-to-Devanagari transliteration second-chance search for Hindi wikis
Open, HighPublicFeature

Description

Feature summary:
When a Latin-script query gets too few results (say 3, or maybe 0) on Hindi Wikipedia (and maybe other Hindi-language wikis), transliterate the query into Devanagari and try again—either by automatically searching for the transliterated string, or by offering it as a suggestion. (It could be either a "did you mean" suggestion after the search happens, or a completion suggester suggestion (drop down as you type).)

For example, the query tendua gets zero results, but transliterated to तेंदुआ ("leopard"), it gets 190+ results, including an article with that exact title.

Use case:
Based on a brief investigation in T294257, Hindi Wikipedia has a very high zero-results rate (60%+). The majority of non-junk zero-result queries are all Latin; the majority of Latin queries appear to be Devanagari transliterated to Latin, and when automatically transliterated back, almost half either get Wikipedia results or sister-search results from Wiktionary, etc.

The estimate is that up to 23.3% (± 11.9%) of non-junk zero-result queries on Hindi Wikipedia could be rehabilitated with some sort of decent Latin-to-Devanagari transliteration—i.e., ¼ of non-junk zero-result queries and maybe ~15% of all queries.

Benefits:
A significant number of queries that currently get no results would get some sort of results. This would help a large number of searchers, some of whom may have limited access to non-Latin keyboards.