Page MenuHomePhabricator

'幺' => '么' in ZhConversion.php is wrong.
Closed, ResolvedPublic

Description

http://svn.wikimedia.org/viewvc/mediawiki/trunk/phase3/includes/ZhConversion.php?view=markup&pathrev=100226

In 10160

'幺' => '么', is a wrong converter. Zh wikipedia fix this by using http://zh.wikipedia.org/wiki/MediaWiki:Conversiontable/zh-hans#.E5.85.B6.E4.BB.96_2 .I am not sure why don't they update ZhConversion.php, but keep using the low effeciency database conversion.

By the way, how can I fix these problem by myself? I mean is it possible for me to change Mediawiki's git source code?


Version: 1.22.0
Severity: minor

Details

Reference
bz47029

Revisions and Commits

Event Timeline

bzimport raised the priority of this task from to Low.Nov 22 2014, 1:23 AM
bzimport set Reference to bz47029.
bzimport added a subscriber: Unknown Object (MLST).

(In reply to comment #0)

By the way, how can I fix these problem by myself? I mean is it possible for
me
to change Mediawiki's git source code?

Yes, you can create an account so you can submit code updates. Start at https://www.mediawiki.org/wiki/Developer_access

https://www.google.com/search?q=%22%E4%BB%80%E5%B9%BA%22+site%3Atw&ie=utf-8&oe=utf-8&

It seems this rule is still needed in some cases ... on Wikipedia we can just ask editors from Taiwan "don't use '幺' in this way" but situations on different wikis are different. Also '幺' is much less used than '么' so we keep a list (see below) of words with '幺' in it where '幺' => '么' conversion is not applied.

Here's the list:

simpphrases.manual:幺厮
simpphrases.manual:幺半群
simpphrases.manual:幺元
simpphrases.manual:幺爹
simpphrases.manual:幺叔
simpphrases.manual:幺舅
simpphrases.manual:幺爸
simpphrases.manual:幺妈
simpphrases.manual:幺姨
simpphrases.manual:幺娘
simpphrases.manual:幺妹
simpphrases.manual:幺小
simpphrases.manual:幺姓
simpphrases.manual:姓幺
simpphrases.manual:幺氏
simpphrases.manual:幺蛾子
simpphrases.manual:幺麽
simpphrases.manual:幺麽小丑
simpphrases.manual:幺凤
simpphrases.manual:幺二三
simpphrases.manual:幺篇
simpphrases.manual:幺谦

I think for an unusual simple to traditional convert, it is better to write those few words out, instead of maintenance a long convert list.

like:
幺 do not convert
simpphrases.manual:什么 => 什幺

It is the same solution you guys do with 发. 发 in simple could either means 發 and 髪 in traditional. It wrote 发 =>發 and then 头发 =>頭髪 in ZhConversion.php because people use 發 more than 髪.

(In reply to comment #3)

I think for an unusual simple to traditional convert, it is better to write
those few words out, instead of maintenance a long convert list.

like:
幺 do not convert
simpphrases.manual:什么 => 什幺

It is the same solution you guys do with 发. 发 in simple could either means 發
and 髪 in traditional. It wrote 发 =>發 and then 头发 =>頭髪 in ZhConversion.php
because people use 發 more than 髪.

Are you talking about zh-hans to zh-hant conversion or the other way?

'幺' => '么' exists in $zh2Hans conversion in ZhConversion.php.

Anyway can you say your sentence which fails with current rules?

I am talking , zh-hant to zh-hans for '幺' => '么'. I use the 发 in zh-hans to zh-hant as example.

Currently '幺' means '么' is very rare in traditional Chinese, even zh wikipedia forbid '幺' => '么' convert. So this word should not convert to '么' in mediawiki source code.

什么 => 什幺 and 什幺 => 什么 should be add for both zh-hant to zh-hans and zh-hans to zh-hant, instead of the long 幺 not change list.

(In reply to comment #5)

I am talking , zh-hant to zh-hans for '幺' => '么'. I use the 发 in zh-hans to
zh-hant as example.

Currently '幺' means '么' is very rare in traditional Chinese, even zh
wikipedia
forbid '幺' => '么' convert. So this word should not convert to '么' in
mediawiki
source code.

什么 => 什幺 and 什幺 => 什么 should be add for both zh-hant to zh-hans and zh-hans
to
zh-hant, instead of the long 幺 not change list.

幺 looks like just a 异体字 = "variant characters" of 么 in zh-hant so it's not often seen (correct me if I'm wrong). However it's difficult to list all usage of 么 in Chinese. How would you want to add a rule for the following sentence: 你认识zoglun么? when it's written in zh-hant with 么 written as 幺?