Page MenuHomePhabricator

The "Hidden full text manual conversion tag" -{H|}- were failed with zh-hans and zh-hant variants in Mediawiki 1.27.1
Open, Needs TriagePublic

Description

The "Hidden full text manual conversion tag" (隐藏式全文手工转换标签) were failed in Mediawiki 1.27.1 with zh-hans and zh-hant variants.

It is a tag in following format, which supposed to match the word in [zh] and convert it to variants Chinese (such as zh-hans)

-{H|zh:文字1;zh-hans:文字2;zh-hant:文字3;zh-cn:文字4;zh-tw:文字5;zh-hk:文字6;zh-sg:文字7;zh-mo:文字8;}-

For a newly installed Mediawiki 1.27.1 with $wgLanguageCode = "zh"; The tag -{H|}- were not able to convert 文字1 into 文字2 and 文字3 with zh-hans and zh-hant

Example:
/index.php?title=Main_Page&variant=zh (no convertion. It should be 文字1 and it is.)

/index.php?title=Main_Page&variant=zh-hans (convert to zh-hans. It should show 文字2, but not converted here)

/index.php?title=Main_Page&variant=zh-hant (convert to zh-hant. It should show 文字3, but not converted here)

/index.php?title=Main_Page&variant=zh-cn (convert to zh-cn. The rest zh-cn, zh-tw, zh-hk, zh-sg, and zh-mo work well as it should be. So I'm not going to post more screenshot here.)

Related Objects

Event Timeline

Baskice created this task.Oct 27 2016, 1:43 AM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptOct 27 2016, 1:43 AM
Baskice added a comment.EditedOct 27 2016, 2:28 AM

By comment out the line 141 & 142
'zh-hans' => 'unidirectional',
'zh-hant' => 'unidirectional',

from /languages/classes/LanguageZh.php, I can get the -{H|}- tag work in test wiki.
https://phabricator.wikimedia.org/diffusion/MW/browse/master/languages/classes/LanguageZh.php

However, it introduced a new bug, when $wgDisabledVariants were set, variants not been used will be showed.

Example:
-{H|zh:文字1;zh-hans:文字2;zh-hant:文字3;zh-cn:文字4;zh-tw:文字5;zh-hk:文字6;zh-sg:文字7;zh-mo:文字8;}-
with
$wgDisabledVariants = array( 'zh-cn', 'zh-tw', 'zh-hk', 'zh-mo', 'zh-my', 'zh-sg' );

will result in 文字2 in zh-hans environment and 文字3;zh-cn:文字4;zh-tw:文字5;zh-hk:文字6;zh-sg:文字7;zh-mo:文字8 in zh-hant environment.

Aklapper added a subscriber: Shizhao.

@Shizhao: Do you plan to fix this, or why did you add the MW-1.27-release tag?

Zoglun updated the task description. (Show Details)Dec 27 2016, 4:54 PM
Arthur2e5 added a subscriber: Arthur2e5.EditedMar 28 2017, 2:13 AM

I believe this is intentional given the unidirectional setting. The original programmers intended to mainly provide variants instead of scripts for Chinese, since:

  1. The HK/TW split is really large.
  2. Hans/Hant literally only tells you about the set of characters something is written in.

The wgDisabledVariants part is likely intentional as well -- with these variants disabled, why pretend that you actually know something about it and parse its name?

If you are controlling your own wiki (I mean, well, if true), you can always ask your users not to use these inexistent variants and set up abusefilters to warn them. Assuming they are happy with are seeing 義呆利 instead of 意呆利.

This bug may nevertheless be worthy for telling site deployers why MediaWiki is using variants for Chinese by default. Suggestion: close as invalid (by design).

Zoglun added a subscriber: Zoglun.Mar 28 2017, 2:43 AM

I would suggest a systematic explanation about the conversion function documented in mediawiki.org or at least in the file's code comments.

Information about how the Chinese language convention system work were widely, fragmented, and conflicted exist in mediawiki.org/ zh.wikipedia.org/ meta.wikimedia.org/ code comments.

I would suggest a systematic explanation about the conversion function documented in mediawiki.org or at least in the file's code comments.
Information about how the Chinese language convention system work were widely, fragmented, and conflicted exist in mediawiki.org/ zh.wikipedia.org/ meta.wikimedia.org/ code comments.

https://www.mediawiki.org/wiki/Writing_systems/Syntax?

Arthur2e5 added a comment.EditedMar 28 2017, 3:48 PM

To be fair, the syntax page mainly talks about user usage and behavior (well that's not thoroughly documented either), not the actual internal classes used for implementators working on actual classes. Dev docs should probably just go into the source as comments so they are available in doxygen. Patchy-patchy time for T21044.