Page MenuHomePhabricator

Value stored and value displayed are different for large numbers
Closed, ResolvedPublic5 Estimated Story Points

Description

As an editor I want to enter a numeric value and see that the same number is displayed.

Problem:
For some large numbers, the numeric value (P1181) value is rounded up and the rest of the digits are replaced by zero.

Examples:

Acceptance criteria:

  • stored and displayed value are the same

Previous discussion:
Wikidata:Contact the development team#Large quantity values

Event Timeline

Lydia_Pintscher renamed this task from Value stored and value displayed are different to Value stored and value displayed are different for large numbers.Nov 23 2020, 2:46 PM
Lydia_Pintscher triaged this task as High priority.
Lydia_Pintscher updated the task description. (Show Details)
Lydia_Pintscher moved this task from Incoming to Unconnected Stories on the Wikidata-Campsite board.

Q67174314#P1181

I purged this page and now it also has a lot of 0s, so this seems to be a recent regression?

WMDE-leszek set the point value for this task to 5.Nov 26 2020, 11:28 AM

The Wikibase number formatting code seems to use MediaWiki’s Language class (through MediaWikiNumberLocalizer), which was recently modified with several changes by @cscott. Bisecting MediaWiki core is probably a good way to start looking into this task. (I haven’t done that yet.)

From task inspection: This seems to be reproducible locally by creating a Property with data type: quantity and then entering a number similar to the one from the example in the task description.

Recent changes made to formatNumInternal( ) and formatNum( ) in the mediawiki Language class introduced the bug.

Do we have tickets for reverting those changes? Or commits that introduced the issue? Or do we need to adapt our code to the changes that have been made?
It's unclear to me what we need to do to get this resolved.

git bisect puts the blame on https://gerrit.wikimedia.org/r/384006, though it looks like the grouping of large numbers was already broken prior to that (and maybe that was one thing the commit meant to fix):

Screenshot_2020-12-04 Douglas Adams.png (476×912 px, 24 KB)

But after that commit, we can observe the loss of precision:
Screenshot_2020-12-04 Douglas Adams.png (453×912 px, 22 KB)

It looks like this is a limitation of the PHP NumberFormatter class, which MediaWiki uses since that commit:

>>> $nf = new NumberFormatter( 'en', NumberFormatter::DECIMAL );
>>> $nf->format( '999999999999999999' ) # fits in signed 64-bit integer
=> "999,999,999,999,999,999"
>>> $nf->format( '9999999999999999999' ) # does not fit in signed 64-bit integer
=> "10,000,000,000,000,000,000"

Previously, MediaWiki seems to have implemented this digit grouping using a lot of string manipulation, without ever parsing the string as a number.

It looks like this is a limitation of the PHP NumberFormatter class

Request #76093 seems to be the relevant PHP issue (not yet resolved even in latest PHP).

Change 649638 had a related patch set uploaded (by Rosalie Perside (WMDE); owner: Rosalie Perside (WMDE)):
[mediawiki/core@master] Revert "Update formatNum implementation to match tr35 and latest CLDR"

https://gerrit.wikimedia.org/r/649638

Change 649638 abandoned by Rosalie Perside (WMDE):
[mediawiki/core@master] Revert "Update formatNum implementation to match tr35 and latest CLDR"

Reason:

A new patch will be uploaded

https://gerrit.wikimedia.org/r/649638

Change 654891 had a related patch set uploaded (by Rosalie Perside (WMDE); owner: Rosalie Perside (WMDE)):
[mediawiki/core@master] Revert "Update formatNum implementation to match tr35 and latest CLDR"

https://gerrit.wikimedia.org/r/654891

@cscott pinging you because you authored this commit https://gerrit.wikimedia.org/r/c/mediawiki/core/+/384006 which we believe introduced the bug.

Change 656236 had a related patch set uploaded (by Lucas Werkmeister (WMDE); owner: Lucas Werkmeister (WMDE)):
[mediawiki/extensions/Wikibase@master] Check whether MediaWiki can precisely format a number

https://gerrit.wikimedia.org/r/656236

I uploaded a proposed solution (see the above change): check if the formatted number can be parsed to the exact same value, otherwise fall back to the unformatted number. The result is that smaller numbers (or numbers with fewer decimal points) are still formatted according to the language, whereas for others we now show the internal value:

Screenshot_2021-01-14 Douglas Adams-en.png (537×1 px, 32 KB)

Screenshot_2021-01-14 Douglas Adams-ar.png (537×1 px, 26 KB)

This mixture of two styles (which is more noticeable in languages with non-ASCII digits, such as Arabic in the second screenshot) can look ugly, but I think it’s the best we can do given the current limitations of MediaWiki / PHP. @Lydia_Pintscher do you think it’s acceptable?

I'll look at the code, the idea seems reasonable. We could also file a bug upstream to see if PHP will add a bignum or string interface. Could also check libicu to see what types they are supporting (maybe the fault is just in the PHP wrapper).

Worth noting that it was Santhosh and the language team who were the original authors of the patch (and task); I just pushed it over the line.

Thanks for looking into the upstream fix, Scott!

As discussed with Lucas, let's get this merged now. It's better than showing wrong data.

Change 656236 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] Check whether MediaWiki can precisely format a number

https://gerrit.wikimedia.org/r/656236

Addshore subscribed.

Tested using examples in the description on beta
This will be deployed this week to production (probably)

Change 654891 abandoned by Rosalie Perside (WMDE):

[mediawiki/core@master] Revert \"Update formatNum implementation to match tr35 and latest CLDR\" and revert \"Use Unicode minus in output of {{formatenum}}\"

Reason:

https://gerrit.wikimedia.org/r/654891