Follow up for T178996
Gerrit patch 386158 made it easier to search for a few languages in the ULS search box. There are still some cases that are hard to find there, however. This task will list these languages with explanations. This will mostly be based on analyzing the "no-search-results" event in event logging. The description may get updated every now and then based on new data.
(Meta-comment about tagging projects and people: 1. This is a general ULS problem, but tagging Compact Links, because it is one of the most visible areas at the moment. 2. I added some people who may be interested in this issue, or who may have some input. If you are not interested, please unsubscribe and accept my apologies for the spam.)
Special issues for Chinese and Japanese
Except "Zhongwen", which can be resolved with a patch similar to 404799, these are not very common in the failed search statistics, and somewhat complicated.
- zhong -> Chinese (all variants). Among the most frequent search failures. "Zhongwen" is the standard Latin pinyin transliteration for the name of the Chinese language, so it should be findable.
- 繁體 -> Traditional Chinese (zh-hant). This does appear in the data, but perhaps we can optimize this in interlanguage links and take people directly to the Chinese Wikipedia in the traditional variant. (Although generalizing this for sites other than Wikipedia can be challenging.)
- 简体, 简体中文 -> Simplified Chinese (zh-hans). Similar to "繁體 -> Traditional Chinese" above, but for Simplified Chinese.
- にほ -> Japanese (ja). This is the spelling of the Japanese autonym in Hiragana, which is a variant Japanese writing system. It appears surprisingly frequently in failed searches, so it should be supported. It may happen because of a race condition between a Hiragana-based IME and ULS's search algorithm, or for other reasons.
- ㄓㄨ -> Chinese (?). This is Bopomofo, an auxiliary writing system, on which some Chinese input methods are based. It's unclear what are people trying to find when they search for it, however.
- 汉语 -> Chinese (?). This refers to Chinese spoken language. It's unclear what are people searching for with this string. It could be the Pinyin transliteration system, the name of which begins with the same characters (汉语拼音方案), but that's not really a language in the sense that is usually used in ULS.
- Jiantizhongwen, Jianti, Jian -> Chinese (simplified).
Other
- tiêng -> Vietnamese (vi) or maybe something else. The word "tiếng" means "language" in Vietnamese, so many language names in Vietnamese begin with this word. "tiêng" is a misspelling, but event logging data shows that it's very common, so our algorithm should treat it accordingly. Searching for "tiếng" finds Vietnamese, and searching for "tiêng" should do the same. We should treat differences in Latin diacritics the same way we treat other simple spelling errors.