Gerrit patch 386158 made it easier to search for a few languages in the ULS search box. There are still some cases that are hard to find there, however. This task will list these languages with explanations. This will mostly be based on analyzing the "no-search-results" event in event logging. The description may get updated every now and then based on new data.
(Meta-comment about tagging projects and people: 1. This is a general ULS problem, but tagging Compact Links, because it is one of the most visible areas at the moment. 2. I added some people who may be interested in this issue, or who may have some input. If you are not interested, please unsubscribe and accept my apologies for the spam.)
Transliterated and alternate autonyms
- hay, hayeren -> Armenian (hy). "Hayeren" is the Latin transliteration of the autonym of the Armenian language. (This is similar to "Kartuli", the transliterated autonym for Georgian, which was already added, and should be easy to fix.) Gerrit patch
- qartuli -> Georgian (ka). Like Kartuli. Gerrit patch
- nihongo -> Japanese (ja). This is the Latin transliteration of the Japanese autonym. Gerrit patch
- castellano -> Spanish (es). This is a common variant name for Spanish, and it doesn't appear in the data. (The closest thing that we do have is 'castelán' => 'es-es', but this is not 'es', so it cannot actually be found.) Gerrit patch
Languages with script variants and redirects
- каз -> Kazakh (kk). It works correctly with "kaz" (from English) and with "қаз" (the correct Kazakh spelling), but not with "каз", which is either the Russian spelling or the incorrect Kazakh spelling. The letter "қ" is the appropriate letter to use in the Kazakh alphabet, but perhaps some people have a hard time typing it, and type "к" instead. This one is a bit strange, because "kazakh", "казахский", and "қазақ" all appear in the data.
- az, azer, azerba -> Azerbaijani (az). This one is also strange, because "azerbaijani" appears in the data. Occasionally this finds South Azerbaijani (azb), but this is not enough, because these languages are similar in speech, but completely different in writing. This requires some debugging. Searching by "azərb" does work. This is the correct spelling in the Azerbaijani language itself, but it's possible that some people cannot type the letter "ə".
- аз, азер -> Azerbaijani (az). This is similar to the above, but with Cyrillic.
- srpski - Serbian (sr). Currently it either finds Serbo-Croatian (sh), which is a separate Wikipedia, or doesn't find anything at all. It must find Serbian (sr). Possibly related to T121747. (Related pull request)
- punjabi -> Punjabi Western, Punjabi Eastern (both pnb and pa/pa-guru). Gerrit patch 386158 made it possible to find both pnb and pa in the languagesearch API, but pa still doesn't appear in the frontennd, probably because it's a redirect.
Other
- English -> Simple English (simple; in the future, en-simple). English (en) is found, but Simple English (simple, en-simple) must also appear in the results when searching for English.
- Banyumasan, Ngapak -> Banyumasan (map-bms). This language does not have a standard language code, so it doesn't appear in our data. It also doesn't appear as a missing language in the event logging about search, but it was mentioned at T132021 by @Nikola_Smolenski as a language that cannot be found. Perhaps it should also appear in the results when people search for Javanese (jv), because according to Wikipedia it's a dialect of Javanese, but this must be checked.
- in -> Indonesian (id). Not high priority, because "indo" etc. does find it, but would be nice to fix. (See T132021.)
Follow up work is described at T186781