Page MenuHomePhabricator

Wikipedia portal: adjust the languages used for Chinese translations
Closed, ResolvedPublic

Description

It looks like we might be handling the Chinese translations a bit off - we currently only look at the primary part of the language code from translatewiki zh instead of zh-{{variant}} so we're not showing all possible translations, such as zh-yue for Cantonese.

Let's take a look and see what we can do about this.

Event Timeline

Change 374535 had a related patch set uploaded (by Jdrewniak; owner: Jdrewniak):
[wikimedia/portals@master] Verifying l10n file exists before ajax request.

https://gerrit.wikimedia.org/r/374535

Change 383337 had a related patch set uploaded (by Jdrewniak; owner: Jdrewniak):
[wikimedia/portals@master] Exposing available translations in JS variable

https://gerrit.wikimedia.org/r/383337

Change 383338 had a related patch set uploaded (by Jdrewniak; owner: Jdrewniak):
[wikimedia/portals@master] Checking if l10n available before translating page

https://gerrit.wikimedia.org/r/383338

Change 383339 had a related patch set uploaded (by Jdrewniak; owner: Jdrewniak):
[wikimedia/portals@master] Use the browsers full language codes for translation

https://gerrit.wikimedia.org/r/383339

Change 383340 had a related patch set uploaded (by Jdrewniak; owner: Jdrewniak):
[wikimedia/portals@master] Exposing zh variant translations

https://gerrit.wikimedia.org/r/383340

Change 374535 abandoned by Jdrewniak:
Verifying l10n file exists before ajax request.

Reason:
abandoning this patch in favor of the patches on this topic https://gerrit.wikimedia.org/r/#/q/topic:T171647-exposing-chinese-variants

https://gerrit.wikimedia.org/r/374535

The way wm-portal.js handled the Chinese localization was by baking the Traditional Chinese strings into the page, along with Simplified Chinese versions in data-convert-hans and data-converttitle-hans attributes; [convertChinese()](https://phabricator.wikimedia.org/diffusion/WPOR/browse/master/dev/wikipedia.org/assets/js/wm-portal.js;66831d2a556c51b52344454108129c8bcf286829$116) would then swap in the Simplified Chinese strings if the browser’s language was zh-hans, zh-cn, zh-sg, or zh-my.

None of the major browsers have an official translation in Min Nan, so I’d expect few users to have configured their browsers to prefer Min Nan independently of the UI language. That said, I think it’s important to set lang="nan": lang="zh-min-nan" causes browsers to choose CJK fonts, whereas the Min Nan Wikipedia uses the Latin-based Pe̍h-ōe-jī alphabet exclusively. Someday browsers may recognize standard three-letter language codes for font substitution, but that certainly won’t happen for ad-hoc codes like zh-min-nan.

We created the JSON files in l10n from Module:Project portal/wikis, naming the files based on wiki subdomains (e.g., zh-yue.json). Meanwhile, translatewiki.net lays down JSON files named for ISO 639 codes (hence yue.json). So for some languages, we have some properties in one file and some in another. Had anyone translated the strings into Literary Chinese (lzh) at translatewiki.net, we would’ve had both zh-classical.json and lzh.json for the same wiki.

@mxn wrote:

... if the browser’s language was zh-hans, zh-cn, zh-sg, or zh-my.

I don't think zh-my is needed as no actual users will use it.

@mxn good point on lang="nan" I'll revert that back in the patch here.
Also, to your point on yue.json and zh-yue.json, it looks like in this specific instance the translation strings don't collide, so these two files can be merged without much conflict, but in the future, if such a conflict did arise, like with lzh, I suppose the newer file could take precedence?

As for the language codes, I apologize for my unfamiliarity with what the correct mappings are. In chrome for example, zh-tw is described as "Chinese (traditional)" which I assumed could be mapped to the zh-classical wiki. As is stands though, I'm still not sure for what browser language codes we should be showing zh-classical wiki in the top ten.

Screen Shot 2017-10-12 at 2.14.02 PM.png (411×538 px, 92 KB)

For the rest of the Chinese language codes, would the following mapping be correct?

browser codetranslation file
zh zh-hantzh-hant.json
zh-hans zh-cn zh-sgzh-hans.json
zh-hkzh-yue.json + yue.json

Currently, we're only serving the zh.json file for anyone who has a zh-* browser language. The zh-hant.json file seems to be a more complete translation than the zh.json file, would it be preferable to serve that in place of the zh.json? or should we combine the two?

debt changed the task status from Open to Stalled.Nov 2 2017, 3:52 PM

If there is anyone that can help with this, we'd really appreciate the feedback! :)

debt lowered the priority of this task from Medium to Low.Nov 28 2017, 4:38 PM
debt moved this task from Needs code review to Backlog on the Discovery-Portal-Sprint board.

Moving to the backlog -- as we don't have a clear path forward at this time on how to deal with all the various Chinese language variants.

Moving to the backlog board for review at a future date.

debt added a subscriber: Jdrewniak.
Aklapper changed the task status from Stalled to Open.May 14 2020, 12:19 PM

The previous comments don't explain what/who exactly this task is stalled on ("If a report is waiting for further input (e.g. from its reporter or a third party) and can currently not be acted on"). Hence resetting task status.

(Smallprint, as general orientation for task management: If you wanted to express that nobody is currently working on this task, then the assignee should be removed and/or priority could be lowered instead. If work on this task is blocked by another task, then that other task should be added via Edit Related Tasks...Edit Subtasks. If this task is stalled on an upstream project, then the Upstream tag should be added. If this task requires info from the task reporter, then there should be instructions which info is needed. If this task is out of scope and nobody should ever work on this, then task status should have the "Declined" status.)

@mxn good point on lang="nan" I'll revert that back in the patch here.
Also, to your point on yue.json and zh-yue.json, it looks like in this specific instance the translation strings don't collide, so these two files can be merged without much conflict, but in the future, if such a conflict did arise, like with lzh, I suppose the newer file could take precedence?

As for the language codes, I apologize for my unfamiliarity with what the correct mappings are. In chrome for example, zh-tw is described as "Chinese (traditional)" which I assumed could be mapped to the zh-classical wiki. As is stands though, I'm still not sure for what browser language codes we should be showing zh-classical wiki in the top ten.

Screen Shot 2017-10-12 at 2.14.02 PM.png (411×538 px, 92 KB)

For the rest of the Chinese language codes, would the following mapping be correct?

browser codetranslation file
zh zh-hantzh-hant.json
zh-hans zh-cn zh-sgzh-hans.json
zh-hkzh-yue.json + yue.json

Currently, we're only serving the zh.json file for anyone who has a zh-* browser language. The zh-hant.json file seems to be a more complete translation than the zh.json file, would it be preferable to serve that in place of the zh.json? or should we combine the two?

zh.json should be deleted since its content is a mixture of Simplified and Traditional Chinese and is quite imcomplete.

And I think the best approach is to follow the fallback mechanism of MediaWiki's LanguageConverter:

browser codetranslation file
zh zh-hans zh-cn zh-sg zh-myzh-hans.json
zh-hant zh-tw zh-hk zh-mozh-hant.json

zh-yue.json, yue.json and zh-classical.json should not be used for UI translation.

Change 787874 had a related patch set uploaded (by Tranve; author: Tranve):

[wikimedia/portals@master] Reimplement Chinese translation support

https://gerrit.wikimedia.org/r/787874

I agree that zh-classical shouldn’t be included in the conversion feature, since no browser or operating system would come with a Classical Chinese localization anyways.

yue seems to be different: there’s a character converter on the Cantonese Wikipedia that switches between traditional and simplified characters. There aren’t separate MediaWiki localizations for Cantonese that correspond to traditional and simplified characters.

The original intention of the character conversion code was to respect these wikis’ approaches to diglossia, not merely to align with MediaWiki localizations. However, if we’re confident that the Cantonese Wikipedia doesn’t need character conversion for its strings on the portal, then the change above will simplify things a bit.

Change 787874 merged by jenkins-bot:

[wikimedia/portals@master] Reimplement Chinese translation support

https://gerrit.wikimedia.org/r/787874

Patch is merged. The bot will build the portal next Monday, let's see whether everything works by the time.

Diskdance claimed this task.

Test environment: Chrome 101.0.4951.54 (Windows 10)

Browser with a non-zh localeBrowser with locale set to Chinese (Simplified)Brower with locale set to Chinese (Hong Kong)
image.png (1×2 px, 341 KB)
image.png (1×2 px, 313 KB)
image.png (1×1 px, 292 KB)
✔Show both simp and trad translation✔Show simp translation✔Show trad translation

Everything looks fine now, thereby I'm closing this task. If related problems occur, feel free to reopen this task.

Change 383340 abandoned by Jdrewniak:

[wikimedia/portals@master] Exposing zh variant translations

Reason:

https://gerrit.wikimedia.org/r/383340

Change 383339 abandoned by Jdrewniak:

[wikimedia/portals@master] Use the browsers full language codes for translation

Reason:

https://gerrit.wikimedia.org/r/383339

Change 383337 abandoned by Jdrewniak:

[wikimedia/portals@master] Exposing available translations in JS variable

Reason:

https://gerrit.wikimedia.org/r/383337

Change 383338 abandoned by Jdrewniak:

[wikimedia/portals@master] Checking if l10n available before translating page

Reason:

https://gerrit.wikimedia.org/r/383338