Page MenuHomePhabricator

Wikipedia portal: adjust the languages used for Chinese translations
Open, LowPublic


It looks like we might be handling the Chinese translations a bit off - we currently only look at the primary part of the language code from translatewiki zh instead of zh-{{variant}} so we're not showing all possible translations, such as zh-yeu for Cantonese.

Let's take a look and see what we can do about this.

Event Timeline

debt created this task.Jul 25 2017, 8:08 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJul 25 2017, 8:08 PM

Change 374535 had a related patch set uploaded (by Jdrewniak; owner: Jdrewniak):
[wikimedia/portals@master] Verifying l10n file exists before ajax request.

Change 383337 had a related patch set uploaded (by Jdrewniak; owner: Jdrewniak):
[wikimedia/portals@master] Exposing available translations in JS variable

Change 383338 had a related patch set uploaded (by Jdrewniak; owner: Jdrewniak):
[wikimedia/portals@master] Checking if l10n available before translating page

Change 383339 had a related patch set uploaded (by Jdrewniak; owner: Jdrewniak):
[wikimedia/portals@master] Use the browsers full language codes for translation

Change 383340 had a related patch set uploaded (by Jdrewniak; owner: Jdrewniak):
[wikimedia/portals@master] Exposing zh variant translations

Change 374535 abandoned by Jdrewniak:
Verifying l10n file exists before ajax request.

abandoning this patch in favor of the patches on this topic

mxn added a subscriber: mxn.Oct 11 2017, 4:56 AM
mxn added a comment.EditedOct 11 2017, 5:12 AM

The way wm-portal.js handled the Chinese localization was by baking the Traditional Chinese strings into the page, along with Simplified Chinese versions in data-convert-hans and data-converttitle-hans attributes; [convertChinese()](;66831d2a556c51b52344454108129c8bcf286829$116) would then swap in the Simplified Chinese strings if the browser’s language was zh-hans, zh-cn, zh-sg, or zh-my.

None of the major browsers have an official translation in Min Nan, so I’d expect few users to have configured their browsers to prefer Min Nan independently of the UI language. That said, I think it’s important to set lang="nan": lang="zh-min-nan" causes browsers to choose CJK fonts, whereas the Min Nan Wikipedia uses the Latin-based Pe̍h-ōe-jī alphabet exclusively. Someday browsers may recognize standard three-letter language codes for font substitution, but that certainly won’t happen for ad-hoc codes like zh-min-nan.

mxn added a comment.Oct 11 2017, 5:47 AM

We created the JSON files in l10n from Module:Project portal/wikis, naming the files based on wiki subdomains (e.g., zh-yue.json). Meanwhile, lays down JSON files named for ISO 639 codes (hence yue.json). So for some languages, we have some properties in one file and some in another. Had anyone translated the strings into Literary Chinese (lzh) at, we would’ve had both zh-classical.json and lzh.json for the same wiki.

@mxn wrote:

... if the browser’s language was zh-hans, zh-cn, zh-sg, or zh-my.

I don't think zh-my is needed as no actual users will use it.

@mxn good point on lang="nan" I'll revert that back in the patch here.
Also, to your point on yue.json and zh-yue.json, it looks like in this specific instance the translation strings don't collide, so these two files can be merged without much conflict, but in the future, if such a conflict did arise, like with lzh, I suppose the newer file could take precedence?

As for the language codes, I apologize for my unfamiliarity with what the correct mappings are. In chrome for example, zh-tw is described as "Chinese (traditional)" which I assumed could be mapped to the zh-classical wiki. As is stands though, I'm still not sure for what browser language codes we should be showing zh-classical wiki in the top ten.

For the rest of the Chinese language codes, would the following mapping be correct?

browser codetranslation file
zh zh-hantzh-hant.json
zh-hans zh-cn zh-sgzh-hans.json
zh-hkzh-yue.json + yue.json

Currently, we're only serving the zh.json file for anyone who has a zh-* browser language. The zh-hant.json file seems to be a more complete translation than the zh.json file, would it be preferable to serve that in place of the zh.json? or should we combine the two?

debt changed the task status from Open to Stalled.Nov 2 2017, 3:52 PM

If there is anyone that can help with this, we'd really appreciate the feedback! :)

debt lowered the priority of this task from Medium to Low.Nov 28 2017, 4:38 PM
debt moved this task from Needs code review to Backlog on the Discovery-Portal-Sprint board.

Moving to the backlog -- as we don't have a clear path forward at this time on how to deal with all the various Chinese language variants.

Moving to the backlog board for review at a future date.

debt removed Jdrewniak as the assignee of this task.Dec 12 2017, 4:16 PM
debt added a subscriber: Jdrewniak.
94rain added a subscriber: 94rain.Apr 24 2019, 2:17 AM
Aklapper changed the task status from Stalled to Open.May 14 2020, 12:19 PM

The previous comments don't explain what/who exactly this task is stalled on ("If a report is waiting for further input (e.g. from its reporter or a third party) and can currently not be acted on"). Hence resetting task status.

(Smallprint, as general orientation for task management: If you wanted to express that nobody is currently working on this task, then the assignee should be removed and/or priority could be lowered instead. If work on this task is blocked by another task, then that other task should be added via Edit Related Tasks...Edit Subtasks. If this task is stalled on an upstream project, then the Upstream tag should be added. If this task requires info from the task reporter, then there should be instructions which info is needed. If this task is out of scope and nobody should ever work on this, then task status should have the "Declined" status.)