Page MenuHomePhabricator

{{formatnum:}} magic word loses precision since MediaWiki 1.36(?)
Closed, ResolvedPublic

Description

In old MediaWiki versions, {{formatnum:}} was able to format integers of arbitrary size or floating-point numbers of arbitrary precision. For instance, on an ancient MediaWiki 1.18 install, we can test the formatting of

{{formatnum:9999999999999999}}

{{formatnum:3.1415926535897932384626433832795028841971693993751058209749445923078164062862089986280348253421170679821480865132823066470938}}

and see that the result is:

<p>9,999,999,999,999,999
</p><p>3.1415926535897932384626433832795028841971693993751058209749445923078164062862089986280348253421170679821480865132823066470938
</p>

(The formatting of the floating-point number would be more interesting if it was formatted into a locale with different digits, but I don’t know a way to force that via a URL parameter – ?uselang doesn’t affect the page content language.)

On the other hand, on current MediaWiki in Wikimedia production, the result is instead:

<p>10,000,000,000,000,000
</p><p>3.1415926535897930000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
</p>

I believe this is a consequence of T167088 and therefore changed in MediaWiki 1.36, though I haven’t tested other MediaWiki versions.

A fix for this might eventually arrive in PHP request #76093, though attempts to implement it have not been successful so far.

Event Timeline

In a way, this task is old news, but I at least hadn’t previously realized that it affected not just Wikibase (⇒ T268456) but also ordinary wikitext. And I was surprised to see no mention of this issue in the documentation (neither on mediawiki.org nor on en.wikipedia.org), so I thought it made sense to have a Phabricator task for it.

One idea that came up in code review on this change would be to add a flag / parameter to the magic word which implements a similar trick as we did in Wikibase: format the number; parse it back; if it’s the same, then use the formatted string; otherwise use the original string. This way, the number is formatted nicely if that’s possible without losing precision, but otherwise displays exactly (but unformatted) rather than losing precision. (We could even make this the default behavior of the magic word, but a flag feels safer.) So {{formatnum:9999999999999999}} would still format to 10,000,000,000,000,000 (or 10٬000٬000٬000٬000٬000 or ১০,০০,০০,০০,০০,০০,০০,০০০ etc.), but {{formatnum:9999999999999999|REVERSIBLE}} (or any other flag name) would format to 9999999999999999 instead (in all languages).

Change #1143059 had a related patch set uploaded (by Arthur taylor; author: Arthur taylor):

[mediawiki/core@master] Add `LOSSLESS` option to `formatnum` parser function

https://gerrit.wikimedia.org/r/1143059

Change #1145237 had a related patch set uploaded (by Lucas Werkmeister (WMDE); author: Arthur taylor):

[mediawiki/core@master] WIP: Add `LOSSLESS` option to `formatnum` parser function

https://gerrit.wikimedia.org/r/1145237

Change #1143059 abandoned by Arthur taylor:

[mediawiki/core@master] Add `LOSSLESS` option to `formatnum` parser function

Reason:

Abandoned in favour of I2372dcec6f

https://gerrit.wikimedia.org/r/1143059

Change #1145237 merged by jenkins-bot:

[mediawiki/core@master] Add `LOSSLESS` option to `formatnum` parser function

https://gerrit.wikimedia.org/r/1145237

Is it documented anywhere how the LOSSLESS flag works and fixes this issue? I saw the patch and it is not clear what it does. E.g. there is no test case for a language with different digits or separators.

Hi! Sorry - I might have been a bit hasty with the +2 - my bad. We didn't get any feedback on the ticket or the draft patches and we found an approach that worked for Wikibase without interfering with any other functionality. But I understand from @Lucas_Werkmeister_WMDE that it would have been better to wait for more explicit feedback.

What would you like to see for test cases and documentation? Specifically, which languages and where should the documentation go?

Specifically, which languages

I would suggest adding tests for Bengali (bn), because it both uses different digits and a different digit grouping than the standard.

Change #1146630 had a related patch set uploaded (by Lucas Werkmeister (WMDE); author: Lucas Werkmeister (WMDE)):

[mediawiki/core@master] Add language=bn test for lossless formatnum

https://gerrit.wikimedia.org/r/1146630

I uploaded a patch for a bn test, though there’s not that much to see. What the LOSSLESS flag does is test if the formatted number can be parsed back into the same number, and if not, emit it unformatted instead. Using this flag effectively declares that you’d rather have an exact number at the cost of sometimes not having it formatted at all.

Change #1146630 merged by jenkins-bot:

[mediawiki/core@master] Add language=bn test for lossless formatnum

https://gerrit.wikimedia.org/r/1146630