Page MenuHomePhabricator

Very small (or very large) quantity values (represented in scientific notation) result in error in add/update via pywikibot/wikidata API
Open, HighPublic

Description

pywikibot was recently updated to better handle decimal values - see this gerrit code change: https://gerrit.wikimedia.org/r/#/c/250497/ - however, as I noted there in a comment at the end, there is a problem with very small values (and I believe also for very large ones) which the formatter converts to exponential notation. The Wikidata API does not accept numbers for quantity values formatted with exponential notation. Either the formatter on the pywikibot side needs to be smarter in converting values to a standard decimal value the API understands, or the API needs to be more generous in accepting scientific notation.

Here's the symptom of the problem: I tried adding a "proportion" qualifier value that is 1.9e-9. I get the following warning and stack trace:

WARNING: API error invalid-snak: Invalid snak (Value must match the pattern for decimal values.)
Traceback (most recent call last):
File "pwb.py", line 248, in <module>
  if not main():
 ...
  claim.addQualifier(prop_qual, bot=True, summary="Adding branching fraction qualifier from NNDC.")
File ".../core/pywikibot/page.py", line 4404, in addQualifier
  data = self.repo.editQualifier(self, qualifier, **kwargs)
File ".../core/pywikibot/site.py", line 1297, in callee
  return fn(self, *args, **kwargs)
File ".../core/pywikibot/site.py", line 7019, in editQualifier
  data = req.submit()
File ".../core/pywikibot/data/api.py", line 2178, in submit
  raise APIError(**result['error'])
pywikibot.data.api.APIError: invalid-snak: Invalid snak (Value must match the pattern for decimal values.) [messages:[{'parameters': [], 'name': 'wikibase-api-invalid-snak', 'html': {'*': 'Invalid snak'}}]; help:See https://www.wikidata.org/w/api.php for API usage]

I have modified the pywikibot code to format the quantity values as "+0.0000000019" rather than "+1.9e-09" and it goes through just fine. That is one solution, but it would probably better for the API to handle scientific notation properly as this will come up with any client that tries to provide very small (or large) values as quantities.

Event Timeline

ArthurPSmith raised the priority of this task from to Needs Triage.
ArthurPSmith updated the task description. (Show Details)
ArthurPSmith subscribed.

Please note this is still an issue with the latest pywikibot code and current wikidata release - as of June 23, 2016. The following is the fix I have in the pywikibot core pywikibot/__init__.py file:

instead of

format(value, "+g")

we need:

if math.fabs(value) < 0.001:
    num_str = float_fix.convert_sc_to_str(float(value))
    if value >= 0:
        num_str = '+{0}'.format(num_str)
else:
    num_str = format(value, "+g")
return num_str

where float_fix.convert_sc_to_str() is a custom function that creates the correct string format, since nothing in python seems to do the trick. Probably better to fix this on the API end, but I'm happy to share the float_fix code if that's preferred.

Here is some information of the range of values the API accepts: https://www.wikidata.org/wiki/Help:Statements#Quantitative_values

We probably need more testing so things like 1e-123 arrive as:

0.000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001

and 1.0e-123 arrives as:

0.0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010

Those are some rather important edge cases. The API works more like a metrologist or engineering software where trailing zeros are preserved. But I am also not an expert in this topic.

As far as testing goes, I have (in my own copy) added the following to the pywikibot tests/wikibase_edit_tests.py file (within the class TestWikibaseMakeClaim):

def _check_quantity_claim(self, value, uncertainty):
    """Helper function to add and check quantity claims"""
    testsite = self.get_repo()
    item = self._clean_item(testsite, 'P64')

    # set new claim
    claim = pywikibot.page.Claim(testsite, 'P64', datatype='quantity')
    target = pywikibot.WbQuantity(value, error=uncertainty)
    claim.setTarget(target)
    item.addClaim(claim)
    item.get(force=True)
    claim = item.claims['P64'][0]
    self.assertEqual(claim.getTarget(), target)


def test_medium_quantity_edit(self):
    """Attempt to add medium-size quantity claim."""
    self._check_quantity_claim(1.5, 0.1)


def test_small_quantity_edit(self):
    """Attempt to add very small quantity claim."""
    self._check_quantity_claim(1.0e-7, 2.0e-8)


def test_large_quantity_edit(self):
    """Attempt to add large quantity claim."""
    self._check_quantity_claim(1.935e35, 1e32)


def test_negative_quantity_edit(self):
    """Attempt to add negative quantity claims."""
    self._check_quantity_claim(-1.5, 0.1)

When these tests are run via
python pwb.py tests/wikibase_edit_tests.py -v
both test_large_quantity_edit() and test_small_quantity_edit fail, with messages:

Attempt to add large quantity claim. ... WARNING: API error invalid-snak: Invalid snack (Value must match the pattern for decimal values.)
Attempt to add very small quantity claim. ... WARNING: API error invalid-snak: Invalid snak. (Value must match the pattern for decimal values.)

I believe what you do is constructing a JSON blob, and that is not allowed to have exponential representation, see https://github.com/DataValues/Number/blob/master/src/DataValues/DecimalValue.php#L43. When you use the parser it is allowed to have other formats, see https://github.com/DataValues/Number/blob/master/src/ValueParsers/DecimalParser.php and the relevant test cases in the same code repository.

Hmm. So is it a pywikibot problem or a wikibase API problem? Is pywikibot sending in JSON format?

That restriction is NOT in the JSON spec: http://tools.ietf.org/html/rfc7159.html#section-6 - also the leading plus is not required by JSON. Is there some other reason for the limitation in the wikidata code? DataValues is a wikidata-specific PHP library right? I can't think of any good reason to keep this limitation on input values.

This limitation is not on input values. Use the wbparsevalue API, which is what the wikidata.org UI does and what https://www.wikidata.org/wiki/Help:Statements#Quantitative_values describes, and you can have all kinds of inputs.

The resulting QuantityValue data structure must be in the format the Wikibase code base specifies in https://github.com/DataValues/Number/blob/master/src/DataValues/DecimalValue.php#L43 (a QuantityValue is constructed of 3 DecimalValues and the unit). This format does not use the numbers from the JSON spec, but strings, to avoid all kinds of rounding issues and data loss that happens when you enter the IEEE world. Pywikibot should do the same and avoid doing math on IEEE floating point numbers.

So yes, I'm afraid this is an issue in Pywikibot not following the Wikibase specifications.

You're the one who brought up JSON! It sounds like the issue is something different though - internal representation as strings? Anyway, are you recommending pywikibot use the wbparsevalue API for all (or at least numerical) input? That could be a good idea. Looks like it there was already a phabricator ticket on this - T112140

Tobias, any thoughts?

Pywikibot should not assume all QuantityValues can be casted to IEEE numbers. For example, a QuantityValue can be "1000000.00000000054321". Depending on the data types you have in your programming language (if it's single, double or something else) converting this to a number and back to a string will result in something like "1000000.0000000006" or worse:

1000000.00000000054321.toFixed( 40 )
// Output: "1000000.0000000005820766091346740722656250000000"

That should be avoided, obviously. Never cast the elements from a QuantityValue to IEEE numbers when not necessary for actual calculations.

When dealing with user inputs you can (and should) either use wbparsevalue, or come up with your own parser that converts this to a string allowed in a QuantityValue. Again, make sure you are converting "100000000000000054321e-14" to "1000000.00000000054321" without loosing precision.

Ok, that echoes something Tobias has said also about using strings and avoiding IEEE fp. I'm going to look at getting T112140 working first and then see if I can bring that implementation to bear on this.

DD063520 subscribed.

Hello,

I'm also encountering this. I also saw that it is related to this:

https://phabricator.wikimedia.org/T204331

Can we make a pywikibot patch? I tried to make one, but not there yet ....

D063520

Ok,

I think I found a patch. We can changes this https://github.com/wikimedia/pywikibot/blob/2dfe67426c22c3c11cf9be0eabcf538f8848bd48/pywikibot/__init__.py#L847:

def toWikibase(self):
        """
        Convert the data to a JSON object for the Wikibase API.
        @return: Wikibase JSON
        @rtype: dict
        """
        json = {'amount': self._fromdecimal(self.amount),
                'upperBound': self._fromdecimal(self.upperBound),
                'lowerBound': self._fromdecimal(self.lowerBound),
                'unit': self.unit
                }
        return json

to:

def toWikibase(self):
    """
    Convert the data to a JSON object for the Wikibase API.

    @return: Wikibase JSON
    @rtype: dict
    """
    json = {'amount': '{:f}'.format(Decimal(self._fromdecimal(self.amount))),
            'upperBound': {:f}'.format(Decimal(self._fromdecimal(self.upperBound)),
            'lowerBound': {:f}'.format(Decimal(self._fromdecimal(self.lowerBound)),
            'unit': self.unit
            }
    return json

Should I submit this? Or do you see any problems?

Salut
D063520

PS: I add @Xqt because I guess you can help here

Sorry I never got around to looking at this further. @DD063520 do you understand the above comment from @thiemowmde about using the wbparsevalue api rather than python internals?

No, problem .... there is still time to do it ; ) ..... I see what @thiemowmde means. But I'm not sure if I can implement this. The problem is that there can be precision problems if it is done as I propose, right?

My T204331 is probably a duplicate.

Should I submit this?

Sure, reviews are better in Gerrit.

There are two problems:

  • Pywikibot can lose precision when manipulating data
  • Pywikibot submits values that are not valid according to Wikibase (eg. "+1.9e-09")
Xqt triaged this task as High priority.Feb 16 2020, 8:14 AM

PS: I add @Xqt because I guess you can help here

Like matej_suchanek suggests please submit patches to gerrit for reviewing. You may use Gerrit patch uploader if you don't want your own account there.

Anyway 'amount': '{:f}'.format(Decimal(self._fromdecimal(self.amount))) looks a bit odd to me. self._fromdecimal seems to be a method converting decimal.Decimal to something else and Decimal(<something else>) wents this back. The result sounds like self._todecimal() method. Anyway I haven't investigated deeper into this matter currently.

This comment was removed by DD063520.

Sorry, I'm not able to perform this task. I was not even aware it was assigned to me.

Xqt removed Aklapper as the assignee of this task.Aug 22 2022, 1:45 PM

Here is some information of the range of values the API accepts: https://www.wikidata.org/wiki/Help:Statements#Quantitative_values

We probably need more testing so things like 1e-123 arrive as:

0.000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001

and 1.0e-123 arrives as:

0.0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010

Those are some rather important edge cases. The API works more like a metrologist or engineering software where trailing zeros are preserved. But I am also not an expert in this topic.

I feel like you would want to preserve sig figs although that could probably be replaced with lower/upper bound which would be more incompliance with other values.