Page MenuHomePhabricator

Searching for "Punjabi" in Compact Language Links works strangely
Closed, ResolvedPublic

Description

To reproduce:

  1. Make sure that you have Compact Language Links enabled in the French Wikipedia.
  2. Go to https://fr.wikipedia.org/wiki/Lahore (The article "Lahore" in the French Wikipedia)
  3. Click "90 de plus" (90 more [languages])
  4. Type "p" (one letter) in the search box.
    1. Several languages whose name begins with "P" appear: Portuguese, Polish, etc. Scroll down. Western Punjabi (پنجابی) and Eastern Punjabi (ਪੰਜਾਬੀ) both appear under "Asie" (Asia), as expected.
  5. Go back to the search box and add "u", so that the search string will be "pu".
    1. Several languages still appear in the results, but no variety of Punjabi appears there.

I tried playing with the languagesearch API and got even stranger results. Searching with the strings "p", "pu" and "pun" finds "pa", but not "pnb".

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

I think that I see why does this happen.

In jquery.uls.languagefilter.js, the search API is only activated if there are no results in the box itself. This will work badly when there are languages whose autonym or ISO code begins with the letter P in Latin, but the user is searching for a language whose autonym is not Latin. It will also happen when typing "ma" and searching for Malayalam.

This was done as an optimization back in 2012. It's line 117 in the linked diff, and line 198 in the current master. It must be changed. Otherwise it's impossible to find some languages using the search box.

Removing the check for this.resultCount is probably not the only thing to fix, though...

Amire80 moved this task from Backlog to Missing languages on the ULS-CompactLinks board.

Classifying this is "missing languages" and assigning high priority, because this makes search imperfect for some languages.

Change 386158 had a related patch set uploaded (by Amire80; owner: Amire80):
[mediawiki/extensions/UniversalLanguageSelector@master] Add special language names to facilitate searching

https://gerrit.wikimedia.org/r/386158

Change 386158 merged by jenkins-bot:
[mediawiki/extensions/UniversalLanguageSelector@master] Add special language names to facilitate searching

https://gerrit.wikimedia.org/r/386158

This seems to work for pnb now, but not for pa. The languagesearch API now servers both, but pa still doesn't appear in the frontent, probably because it's a redirect. It's possible that https://github.com/wikimedia/jquery.uls/pull/275 fixes it, but it needs careful testing.

To summarize the current state on cawiki (wmf.7):

Search stringResult
pWestern Punjabi and Eastern Punjabi
paEastern Punjabi only (and the language name is translated as Punjabi, so Eastern Punjabi lang name does not appear)
punEastern Punjabi
pan ISO 639-3 for Eastern Punjabinothing
puEastern Punjabi
pnbWestern Punjabi
punjabiEastern Punjabi

Only ISO 639-3 treats Eastern Punjabi (pan) and Western Punjabi (pnb) as separate lang codes. Before there was only one code for Panjabi/ Punjabi (pa or pan). Logically, it would be helpful to see Western Punjabi and Eastern Punjabi appear in search results for entering p, pu, pun, or even pan (for panjabi).

Checked in cx-testing

Search stringResult
pWestern Punjabi and Eastern Punjabi
paEastern Punjabi
pan ISO 639-3 for Eastern Punjabi nothing
puWestern Punjabi and Eastern Punjabi
punWestern Punjabi and Eastern Punjabi
pnbWestern Punjabi
punjabiWestern Punjabi and Eastern Punjabi

Btw, srpski (latinica) and српски (ћирилица)‎ both work - the correct suggestion is provided.
Since the objective of this ticket was to provide Western Punjabi and Eastern Punjabi for pu, I am closing this ticket as resolved. There might be additional improvements, e.g. Eastern punjabi should be suggested for pan since it's the lang code for Eastern Punjabi. Or it may be helpful to have both, Eastern and Western punjabi suggestions for pa based on the alternative panjabi spelling.

Another note, in cx-testing, the tooltips for suggested languages are given in the same language as the language (e.g. espanol will have a tooltip español) - this is not the case in production where tooltips are displayed in the user selected UI lang.