Page MenuHomePhabricator

Replace formatnum implementation with PHP NumberFormatter
Closed, ResolvedPublic

Description

We implement commafy, digit transformation based on current language, and decimal separator changes in Language.php. The commafy is now based on the CLDR Unicode TR 35 based digit grouping patterns. But the commafy implementation does not conform completely to TR35 spec. For example, the Language.php commafy method cannot parse a valid pattern like #,##0.###

I propose to replace the number formatting implementation to use PHPs' NumberFormatter class which is more advanced and standard compliant number formatter.

Fallback: If NumberFormatter is not available, fallback to simple digit grouping of #, ##0.### format(English) and avoid our own implementation. Use digit transformation and separator table transformation.

Known Issue Arq - Algerian Spoken Arabic - has different number grouping separator and decimal separator than what is present in CLDR.

Can we remove the $digitTransformTable setting?: Not yet since the frontend uses this for client side number formatting. Also separatorTransformTable is required for commafy $digitGroupingPattern is required for commafy. But the values of all these variables can come from NumberFormatter since it allows querying for pattern for a locale/language so that the frontend and backend implementation are not working on different data.

Event Timeline

santhosh created this task.Jun 6 2017, 5:19 AM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJun 6 2017, 5:19 AM

Change 357000 had a related patch set uploaded (by Santhosh; owner: Santhosh):
[mediawiki/core@master] [WIP] Correct the commafy implementation as Unicode TR35 specification

https://gerrit.wikimedia.org/r/357000

Aklapper renamed this task from Replace formatnum implmentation with PHP NumberFormatter to Replace formatnum implementation with PHP NumberFormatter .Jun 6 2017, 8:42 AM

Change 384006 had a related patch set uploaded (by Santhosh; owner: Santhosh):
[mediawiki/core@master] Update formatNum implmentation to match tr35 and latest CLDR

https://gerrit.wikimedia.org/r/384006

Change 357000 abandoned by Santhosh:
[WIP] Correct the commafy implementation as Unicode TR35 specification

Reason:
Abandoning in favor of https://gerrit.wikimedia.org/r/#/c/384006

https://gerrit.wikimedia.org/r/357000

Nemo_bis triaged this task as Medium priority.Oct 16 2017, 10:22 AM
Nemo_bis updated the task description. (Show Details)
Nemo_bis updated the task description. (Show Details)

One thing to consider is to have consistent formatting both in PHP and JavaScript. For this we actually need generic formatters in both languages, for which we can pass the actual format string (defined in CLDR or specified locally in MediaWiki).

cscott added a subscriber: cscott.Aug 31 2020, 7:30 PM

See also T237467: Invariant failed: Bad UTF-8 (full string verification) which was tracked down to a bug in the current commafy implementation that produced corrupt UTF-8 sequences.

One thing to consider is to have consistent formatting both in PHP and JavaScript. For this we actually need generic formatters in both languages, for which we can pass the actual format string (defined in CLDR or specified locally in MediaWiki).

Possibly we could just create a very thin action API to do the formatting in any 'unusual' cases, which would avoid having duplicate code and potential drift between them. If structured correctly, the AJAX query/responses should be very cache-able.

cscott added a comment.Sep 9 2020, 1:57 AM

Interesting issues discovered during T237467 is that commafy/formatNum have been used to apply to arbitrary strings, not just numeric strings, and there's a decent amount of sloppiness wrt whether the string it applies to is in latin digits or native digits/separators. This is a side-effect of there being no difference between them in us/euro wikis, presumably, so most people "don't worry" about whether they are using formatNum correctly. Anyway, the cleanups to formatNum/commafy for T237467 are tightening the interfaces which will hopefully make this task easier as well.

Change 384006 merged by jenkins-bot:
[mediawiki/core@master] Update formatNum implementation to match tr35 and latest CLDR

https://gerrit.wikimedia.org/r/384006

cscott closed this task as Resolved.Oct 23 2020, 1:48 AM
cscott claimed this task.

Using NumberFormatter seems to cause a loss of precision in numbers that MediaWiki could previously format accurately:

> echo $wgLang->formatNum( '9999999999999999999' ) . PHP_EOL;
9,999,999,999,999,999,999 # 1.35
10,000,000,000,000,000,000 # master
> echo $wgLang->formatNum( '3.1415926535897932384626433832795028841971693993751058209749445923078164062862089986280348253421170679821480865132823066470938' ) . PHP_EOL;
3.1415926535897932384626433832795028841971693993751058209749445923078164062862089986280348253421170679821480865132823066470938 # 1.35
3.1415926535897930000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 # master

This seems to be the cause of T268456: Value stored and value displayed are different for large numbers.