Page MenuHomePhabricator

Normalize urls in Content and Section Translation to standard codes
Closed, ResolvedPublic

Description

Norwegian Bokmål uses the "nb" code but the general code for Norwegian "no" is also used in some contexts, causing confusion to our systems.
For example, Content Translation shows inconsistencies in language code treatment at several places:

  • On the translation dashboard, starting a translation by searching a new article results in the "nb" code to be used, while starting from a suggestions results in the "no" code to be used instead.
  • Starting a translation with the "no" or "nb" codes will result in having a different set of machine translation services listed since "nb" is used for Yandex and for Apertium configurations and "no" is used for Google and for MinT

Since external tools can create their own urls using any version of the language codes, this ticket is focused on normalizing the language codes in the url for languages that may have alternative codes. For the case of Norwegian, a url using "no" as source or target parameter will be converted into using the "nb" one. In this way, external and internal incosistencies would be corrected at the time of processing the url.

Additional considerations:

  • We want to make sure the solution works for Content Translation on desktop as well as Section Translation on mobile.
  • Additional efforts to avoid inconsistencies where they are originated are also welcome, but the focus of the present ticket is to make sure that in the event of alternative codes being used, those are normalized to the expected ones.
Only Yandex shown when starting a translation by searching for a new article which uses 'nb' code in the url (example)Google and MinT shown when starting a translation from suggestions which uses 'no' in the url (example)
no.wikipedia.org_w_index.php_title=Spesial_Innholdsoversettelse&campaign=contributionsmenu&from=en&page=Paneer&targettitle=Pan%C3%ADr&to=nb(Wiki Desktop) (1).png (900×1 px, 354 KB)
image (13).png (526×1 px, 164 KB)

We are normalizing codes to "nb" as per T339091#8980233 (originally it was proposed the other direction which may be problematic in the long term).

Event Timeline

For the case of Norwegian, a url using "nb" as source or target parameter will be converted into using the "no" one.

Why not the opposite way around? no is a macrolanguage code covering both nb and nn, so no and nb aren't actually synonyms even though Wikimedia usually treat them as such. We use no in the Wikipedia URL because it's (currently) technically impossible to move a wiki, but if it was possible we would move it. I've tried to for years to get us (Wikimedia) to not use no whereever possible, so if it would be possible to use nb instead of no here, I'd be grateful.

It's not too big a deal though – as long as the functionality for no and nb is the same (which is the goal of this task) that should be the main priority, but it feels like normalizing to no is just creating problems our future selves when no.wikipedia.org can finally be moved to nb.wikipedia.org.

Sorry if it seems like I'm beating a dead horse here 🙈

For the case of Norwegian, a url using "nb" as source or target parameter will be converted into using the "no" one.

Why not the opposite way around? no is a macrolanguage code covering both nb and nn, so no and nb aren't actually synonyms even though Wikimedia usually treat them as such. We use no in the Wikipedia URL because it's (currently) technically impossible to move a wiki, but if it was possible we would move it. I've tried to for years to get us (Wikimedia) to not use no whereever possible, so if it would be possible to use nb instead of no here, I'd be grateful.

It's not too big a deal though – as long as the functionality for no and nb is the same (which is the goal of this task) that should be the main priority, but it feels like normalizing to no is just creating problems our future selves when no.wikipedia.org can finally be moved to nb.wikipedia.org.

Sorry if it seems like I'm beating a dead horse here 🙈

Thanks for the input @jhsoby. The main intention of the ticket is to avoid the fragmentation. I was suggesting the "no" code since I took for granted the use in the url would be a good reference but lacked the context you provided. So I think it may be preferred to do it the other way around. I'll update the ticket.

Pginer-WMF added a subscriber: santhosh.

The situation has improved a bit since the ticket was originally reported. Now both examples from the description result in the same list of MT providers being shown (Yandex and MinT). However, Google is missing where it is expected to appear. Something may still need fixing on the Google config or the code that processess it.

no.wikipedia.org_w_index.php_title=Spesial_Innholdsoversettelse&campaign=contributionsmenu&from=en&page=Avocado+toast&to=nb(iPad Air).png (1×2 px, 682 KB)

Change 970790 had a related patch set uploaded (by Nik Gkountas; author: Nik Gkountas):

[mediawiki/extensions/ContentTranslation@master] CX Suggestion List: Use the language filter codes for translation start

https://gerrit.wikimedia.org/r/970790

Change 970790 merged by Nik Gkountas:

[mediawiki/extensions/ContentTranslation@master] CX Suggestion List: Use the language filter codes for translation start

https://gerrit.wikimedia.org/r/970790

Code unification is completed. I created a follow-up ticket for making sure Google is shows in the list: T352747: Google is not listed as an option for Norwegian