Page MenuHomePhabricator

MediaWiki $minimumGroupingDigits is differs from CLDR for hy, ru, uk
Closed, DeclinedPublic

Description

MediaWiki's code for minimumGroupingDigits in numeric formats uses 2 for hy,ru, and uk to indicate that 1000 shouldn't have a grouping separator, but the Unicode CLDR defines this as 1 for these three languages: hy, ru, and uk.

So we should probably shift our language definitions to match the unicode definition (or lobby unicode to fix CLDR?).

See http://unicode.org/reports/tr35/tr35-numbers.html#Examples_of_minimumGroupingDigits and Ic721b9a91e78e4ef07040339d1006b7a90a910c0.

Event Timeline

Change 626424 had a related patch set uploaded (by C. Scott Ananian; owner: C. Scott Ananian):
[mediawiki/core@master] WIP: Use Unicode-compliant definition of $minimumGroupingDigits

https://gerrit.wikimedia.org/r/626424

Hello! Is the Language team expected to review this patch? If so, is this time sensitive?

MediaWiki's code for minimumGroupingDigits in numeric formats has an off-by-one error: MediaWiki uses (for example) 2 for polish to indicate that 1000 shouldn't have a grouping separator, but the Unicode CLDR defines this as 1.

As far as I can see CLDR defines minimumGroupingDigits as 2 for pl. See https://github.com/unicode-org/cldr/blob/master/common/main/pl.xml#L5780

Huh. You're right. CLDR does define minimumGroupingDigits as 1 for hy, ru, and uk. It looks like I misinterpreted the comments left in the patch I took over (ce8d0e9599a84565d53965481d1c163a90c4e6dd) as a commentary on the definition on minimumGroupingDigits, not on the correctnes of our hy/ru/uk settings. I'll update the patches and this task title to reflect correcting the discrepancy between CLDR and Mediawiki for hy/ru/uk.

cscott renamed this task from MediaWiki $minimumGroupingDigits is off-by-one to MediaWiki $minimumGroupingDigits is differs from CLDR for hy, ru, uk.Oct 22 2020, 3:18 PM
cscott updated the task description. (Show Details)

Change 626424 merged by jenkins-bot:
[mediawiki/core@master] Correct misinterpretation of $minimumGroupingDigits

https://gerrit.wikimedia.org/r/626424

Someone who speaks Russian just needs to tell us how to write numbers in Russian. Is it 1000 or 1 000?

I think that for all three languages, the separator is supposed to be a space (thin and non-breaking if possible), and for minimum number for starting to separate, both 1000 and 10000 are possible.

For Ukrainian, 10000 is probably the preferred minimum. Here's a math textbook that a Ukrainian user sent me: http://8next.com/matemat/3759-matemat15.html

For Russian, I can't find a clear preference anywhere. I see websites that say that both 1000 and 10000 are possible, and no solid standard. I don't know Armenian, but an Armenian Wikimedian told me in a casual Telegram chat that it's the situation for Armenian, too.

My own intuition as a Russian speaker is that it's safe to set 10000 as the minimum, but I cannot base it on a written standard from the Russian Academy of Sciences, Rosstandart or a comparable institution. The same should probably be done for the other two languages. If anyone disagrees and can cite sources for that, I won't argue.

(The separator definitely must not be a comma. Comma is used for fractions. 1,5 is one and a half.)

Thanks Amir!

The preferred way to submit a change request to CLDR is via the survey tool. Currently for Ukrainian, Apple and Microsoft have apparently voted for minimumGroupingDigits=1, and as TC members they get 6 points each so there are 12 points for minimumGroupingDigits=1. To change it to 2, if Wikimedia joined Unicode as a regular member, its vote would count for 4 points. If Amir joined as a guest, his vote would count for 1 point. The survey tool says that 50 points are needed to change the item, and additionally committee review may be needed. According to the process, the 50 vote threshold is used for committee-approved items.

Change 942526 had a related patch set uploaded (by Tim Starling; author: Tim Starling):

[mediawiki/core@master] Don't defer to CLDR for ru, uk, hy minimumGroupingDigits

https://gerrit.wikimedia.org/r/942526

Change 942526 merged by jenkins-bot:

[mediawiki/core@master] Don't defer to CLDR for ru, uk, hy minimumGroupingDigits

https://gerrit.wikimedia.org/r/942526

I'm closing this as declined, because I don't think it would be beneficial for MediaWiki to change to match CLDR, and the process for changing CLDR is hostile to this sort of contribution.

My commit, linked above, updated the code comments accordingly.

CLDR can be viewed as a style guide. Different speakers of a language may have different views on minor points of style. The preferred style in a region may change over time. CLDR does not capture such details.