Page MenuHomePhabricator

$wgTranslateBlacklist of zh-* on metawiki
Closed, ResolvedPublic

Description

Language converter is available on translated pages because of Title::getPageLanguage(), so it's possible to only translate messages in /zh and leave /zh-* alone. I propose adding zh-hans, zh-hant, zh-cn, zh-hk, zh-mo, zh-my, zh-sg, zh-tw to $wgTranslateBlacklist with a message like "Translate in zh please" to avoid duplicated manual work. (Where's the place to get local consensus if needed?)

By the way the difference between meta and translatewiki is that meta's translations are used for on-wiki display where the converter can be applied, while translatewiki's translations are exported to software strings...

See also [[m:Meta:Requests for bot status/Liangent-bot]]


Version: unspecified
Severity: normal
See Also:
https://bugzilla.wikimedia.org/show_bug.cgi?id=49589

Details

Reference
bz37338

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 12:27 AM
bzimport set Reference to bz37338.
bzimport added a subscriber: Unknown Object (MLST).
liangent created this task.Jun 5 2012, 5:08 AM

(In reply to comment #1)

To get local consensus, you can try http://meta.wikimedia.org/wiki/Meta:Babel

This one's probably fine, Dereckson. Liangent is the maintainer of these features. Removed shellpolicy.

This should probably be set for all Wikimedia wikis in CommonSettings.php. That currently uses what's pasted below:

$wgTranslateBlacklist = array(
'*' => array( 'en' => 'English is the source language.', ),
);

@liangent: Please propose a config with descriptions as above.

(In reply to comment #2)

(In reply to comment #1)

To get local consensus, you can try http://meta.wikimedia.org/wiki/Meta:Babel

This one's probably fine, Dereckson. Liangent is the maintainer of these
features. Removed shellpolicy.
This should probably be set for all Wikimedia wikis in CommonSettings.php. That
currently uses what's pasted below:
$wgTranslateBlacklist = array(

'*' => array( 'en' => 'English is the source language.', ),

);
@liangent: Please propose a config with descriptions as above.

This can be fine for zh. Maybe we also want it in other languages with variants.

$wgTranslateBlacklist = array(

'*' => array(
    'en' => 'English is the source language.',
    'zh-hans' => 'Translate in zh please.',
    'zh-hant' => 'Translate in zh please.',
    'zh-cn' => 'Translate in zh please.',
    'zh-hk' => 'Translate in zh please.',
    'zh-mo' => 'Translate in zh please.',
    'zh-my' => 'Translate in zh please.',
    'zh-sg' => 'Translate in zh please.',
    'zh-tw' => 'Translate in zh please.',
),

);

These may be issues for this configuration change:

gerrit 9860: So users can see converted pages in their expected fonts.
gerrit 10085: Allow anons to see updated pages (not a blocker currently because of gerrit 8622).
gerrit 10243: So users don't need to select variant on every page view.

$wgTranslateBlacklist = array(

'*' => array(
    'en' => 'English is the source language.',

    'gan-hans' => 'Translate in gan please.',
    'gan-hant' => 'Translate in gan please.',

    'ike-cans' => 'Translate in iu please.',
    'ike-latn' => 'Translate in iu please.',

    'kk-cyrl' => 'Translate in kk please.'
    'kk-latn' => 'Translate in kk please.'
    'kk-arab' => 'Translate in kk please.'
    'kk-kz'   => 'Translate in kk please.'
    'kk-tr'   => 'Translate in kk please.'
    'kk-cn'   => 'Translate in kk please.'

    'ku-latn' => 'Translate in ku please.',
    'ku-arab' => 'Translate in ku please.',

    'shi-tfng' => 'Translate in shi please.',
    'shi-latn' => 'Translate in shi please.',

    'sr-ec' => 'Translate in sr please.',
    'sr-el' => 'Translate in sr please.',

    'tg-latn' => 'Translate in tg please.',

    'zh-hans' => 'Translate in zh please.',
    'zh-hant' => 'Translate in zh please.',
    'zh-cn' => 'Translate in zh please.',
    'zh-hk' => 'Translate in zh please.',
    'zh-mo' => 'Translate in zh please.',
    'zh-my' => 'Translate in zh please.',
    'zh-sg' => 'Translate in zh please.',
    'zh-tw' => 'Translate in zh please.',
),

);

This is expected to be a full list of languages with variants available but be careful with languages with a -latn variant: Text written in English (= untranslated text & language selector) may be converted to unreadable text.

Is it Extension:Translate's responsibility to treat untranslated and translated strings differently when displaying them (besides conversion, there's <div lang="***">)?

(In reply to comment #4)

Is it Extension:Translate's responsibility to treat untranslated and translated
strings differently when displaying them (besides conversion, there's <div
lang="***">)?

It probably is. I think Robin and Niklas may be able to reply here... If something goes wrong, it should be treated as a bug, I guess (and tracked in a different issue than this one).

(In reply to comment #5)

(In reply to comment #4)

Is it Extension:Translate's responsibility to treat untranslated and translated
strings differently when displaying them (besides conversion, there's <div
lang="***">)?

It probably is. I think Robin and Niklas may be able to reply here... If
something goes wrong, it should be treated as a bug, I guess (and tracked in a
different issue than this one).

I created bug 37559 and bug 37557.

We often had to delete such pages manually: if they are eligible for speedy deletion, this is not subject to consensus but uncontroversial -> +shell.

Related URL: https://gerrit.wikimedia.org/r/63678 (Gerrit Change I95814d13f5e24a6b8027fac96d209fad53d74426)

Wait a minute. I'm not sure how Extension:Translate is used nowadays. If it's also used for CNBanners, then this fix is inappropriate, as conversion works on content pages only currently.

btw. about my config lines for languages other than zh, I'm not 100% sure.

Thehelpfulonewiki wrote:

(In reply to comment #9)

Wait a minute. I'm not sure how Extension:Translate is used nowadays. If it's
also used for CNBanners, then this fix is inappropriate, as conversion works
on
content pages only currently.

Yeah the intention that CNBanners can now be directly translated through the translate extension (but are only published if a translation admin marks them as published). See https://meta.wikimedia.org/w/index.php?title=Special:MessageGroupStats&filter=&group=Centralnotice-tgroup-Election2013_submission as an example.

Thehelpfulonewiki wrote:

(In reply to comment #10)

btw. about my config lines for languages other than zh, I'm not 100% sure.

There could also be de-* variants?

Yes, but de variants don't have automatic conversion.

li3939108 wrote:

(In reply to comment #14)

Liangents list done in https://gerrit.wikimedia.org/r/#/c/63678/

I am a wikidata user, and someone points me here when I asked about the disabled zh-hans.

I think it is better to add zh-cn, zh-hk, zh-mo, zh-my, zh-sg, zh-tw and zh to $wgTranslateBlacklist, and leaving zh-hans and zh-hant alone.

There is no automatic conversion from zh to zh-hans and zh-hant, or any bot to do this in wikidata currently. Even if there will be automatic conversion in future, the automatic conversion or a bot can just convert between zh-hans and zh-hant.

zh-hans and zh-hant are precisely not the variants of zh , but the subsets of zh. And this two subsets has intersection.

(In reply to comment #15)

There is no automatic conversion from zh to zh-hans and zh-hant, or any bot
to
do this in wikidata currently. Even if there will be automatic conversion in
future, the automatic conversion or a bot can just convert between zh-hans
and
zh-hant.

The converter is generally expected to be able to accept any of variants it knows (for zh, they're zh, zh-hans, zh-hant, zh-cn, zh-hk, zh-mo, zh-my, zh-sg and zh-tw), and output text in any other variant.

li3939108 wrote:

(In reply to comment #16)

(In reply to comment #15)

There is no automatic conversion from zh to zh-hans and zh-hant, or any bot
to
do this in wikidata currently. Even if there will be automatic conversion in
future, the automatic conversion or a bot can just convert between zh-hans
and
zh-hant.

The converter is generally expected to be able to accept any of variants it
knows (for zh, they're zh, zh-hans, zh-hant, zh-cn, zh-hk, zh-mo, zh-my,
zh-sg
and zh-tw), and output text in any other variant.

Sorry, I didn't noticed the automatic converter has been deployed in wikidata. And a drop-down menu, with all 9 zh variants, has been added to non-item pages. Currently it only accept zh. see http://www.wikidata.org/w/index.php?title=Help:Contents/zh-hans&variant=zh-hant and http://www.wikidata.org/w/index.php?title=Help:Contents/zh&variant=zh-hant

However, I think the drop-down menu is not needed in wikidata. 6 of the 9 zh variants are just for dialects in Chinese wikipedia. The non-item pages in wikidata are some instructions, policies ... Besides, The dialect conversion in Chinese wikipedia can also be moved to wikidata, just with the help of the multilingual labels.

If the following mechanism can be implemented, the drop-down menu is not needed. When I choose say Simplified Chinese(zh-hans) but part of page is not yet translated to zh-hans. Then the page is mixed with zh-hans and en. If in that case the page can automatically load the content from zh-hant rather than en, and automatically convert all the content from zh-hans and zh-hant to zh-hans. Then the drop-down menu will not be needed. zh(i.e. no conversion) pages is actually mixed with zh-hans and zh-hant, which I think no one will read it.

(In reply to comment #17)

(In reply to comment #16)

(In reply to comment #15)

There is no automatic conversion from zh to zh-hans and zh-hant, or any bot
to
do this in wikidata currently. Even if there will be automatic conversion in
future, the automatic conversion or a bot can just convert between zh-hans
and
zh-hant.

The converter is generally expected to be able to accept any of variants it
knows (for zh, they're zh, zh-hans, zh-hant, zh-cn, zh-hk, zh-mo, zh-my,
zh-sg
and zh-tw), and output text in any other variant.

Sorry, I didn't noticed the automatic converter has been deployed in
wikidata.
And a drop-down menu, with all 9 zh variants, has been added to non-item
pages.
Currently it only accept zh. see
http://www.wikidata.org/w/index.php?title=Help:Contents/zh-hans&variant=zh-
hant
and
http://www.wikidata.org/w/index.php?title=Help:Contents/zh&variant=zh-hant

You may need to move existing /zh-hans or /zh-hant pages to /zh.

However, I think the drop-down menu is not needed in wikidata. 6 of the 9 zh
variants are just for dialects in Chinese wikipedia. The non-item pages in
wikidata are some instructions, policies ...

Now [[mw:Manual:$wgDisabledVariants]] but I doubt it may affect any planned conversion work later.

Besides, The dialect conversion
in
Chinese wikipedia can also be moved to wikidata, just with the help of the
multilingual labels.
If the following mechanism can be implemented, the drop-down menu is not
needed. When I choose say Simplified Chinese(zh-hans) but part of page is not
yet translated to zh-hans. Then the page is mixed with zh-hans and en. If in
that case the page can automatically load the content from zh-hant rather
than
en, and automatically convert all the content from zh-hans and zh-hant to
zh-hans. Then the drop-down menu will not be needed. zh(i.e. no conversion)
pages is actually mixed with zh-hans and zh-hant, which I think no one will
read it.

I'm planning to do it later for item pages. But for content pages... It's still needed.

li3939108 wrote:

(In reply to comment #18)
I have start a discussion(in Chinese) in wikidata project chat (http://www.wikidata.org/wiki/Wikidata:%E4%BA%92%E5%8A%A9%E5%AE%A2%E6%A0%88#.E5.BC.BA.E7.83.88.E5.BB.BA.E8.AE.AE.E5.BD.BB.E5.BA.95.E6.B8.85.E9.99.A4zh-cn.2Czh-tw.2Czh-hk.2Czh-sg.2Czh-my.2Czh-mo.E8.AF.AD.E8.A8.80.E4.BB.A3.E7.A0.81) about the 9 zh language codes, which I think has cause many problems in Commons. Basically I hope there will be a small number of language codes for Chinese in multilingual projects like Commons, wikidata. Even many Chinese don't know what the 9 zh language codes means. When writing scripts , foreigners always ignore some of the variants or use it in the wrong way and cause bugs.

I'm planning to do it later for item pages. But for content pages... It's
still
needed.

For the item pages, I think we only need a automatic converter for human writing not a automatic converter for human reading. For example, when I enter a zh-hans label it can automatically convert it to zh-hant and store both zh-hans and zh-hant. This will be very simple even can be done with javascript or a bot, although it will cause a little redundancy.

The content pages really need that kind of converter for human reading. Or it can also be resolved with a converter for human writing as above. I think redundancy can be toleranted if it is much more simple.

(In reply to comment #19)

For the item pages, I think we only need a automatic converter for human
writing not a automatic converter for human reading. For example, when I
enter
a zh-hans label it can automatically convert it to zh-hant and store both
zh-hans and zh-hant. This will be very simple even can be done with
javascript
or a bot, although it will cause a little redundancy.
The content pages really need that kind of converter for human reading. Or it
can also be resolved with a converter for human writing as above. I think
redundancy can be toleranted if it is much more simple.

I'm planning to apply conversion on display.

btw Can we stop hijacking this bug and use bug 37461 and bug 36430 instead?

Bencmq added a comment.Jun 8 2013, 8:39 AM

Language conversion isn't available for centralnotice banners yes? I'm afraid.

Bencmq added a comment.Jun 8 2013, 4:24 PM

This request is processed because "language converter is available on translated pages" however CNBanners still uses Translate extension. I need to translate in zh-hans and zh-hant since they have no conversion support for banners. Is there a way around this?

Please open a new ticket. Closing this again.