Page MenuHomePhabricator

Site.parsevalue() gives wrong results
Closed, ResolvedPublicBUG REPORT

Description

Steps to replicate the issue:

I changed the last statements of http.fetch() as follows (adding print statements and disabled encoding):

    try:
        # Note that the connections are pooled which mean that a future
        # HTTPS request can succeed even if the certificate is invalid and
        # verify=True, when a request with verify=False happened before
        response = session.request(method, uri,
                                   headers=headers, auth=auth, timeout=timeout,
                                   **kwargs)
    except Exception as e:
        response = e
    else:
        pass
#        response.encoding = _decide_encoding(response, charset)  ## ignore encodings

    for callback in callbacks:
        callback(response)

    from pprint import pprint  ## added some prints
    pprint(response.json())
    pprint(response.text)
    pprint(response.content)
    return response

and started these statements:

import pywikibot 
site = pywikibot.Site('wikidata')
result = site.parsevalue('quantity', ['1.90e-9+-0.20e-9'], {}, False)

What happens?:
I always got:

https://www.wikidata.org/w/api.php?action=wbparsevalue&datatype=quantity&values=1.90e-9%2B-0.20e-9&options=%7B%7D&maxlag=5&format=json
{'results': [{'raw': '1.90e-9+-0.20e-9',
              'type': 'quantity',
              'value': {'amount': '+0.000000190',
                        'lowerBound': '+0.000000170',
                        'unit': '1',
                        'upperBound': '+0.000000210'}}]}
'{"results":[{"raw":"1.90e-9+-0.20e-9","value":{"amount":"+0.000000190","unit":"1","upperBound":"+0.000000210","lowerBound":"+0.000000170"},"type":"quantity"}]}'
(b'{"results":[{"raw":"1.90e-9+-0.20e-9","value":{"amount":"+0.000000190","unit'
 b'":"1","upperBound":"+0.000000210","lowerBound":"+0.000000170"},"type":"quant'
 b'ity"}]}')

even I cleared the cache. This means the the raw bytes content coming from wiki is wrong alread and text as well json() decoded it right. This is the bytes response (from above):

(b'{"results":[{"raw":"1.90e-9+-0.20e-9","value":{"amount":"+0.000000190","unit'
 b'":"1","upperBound":"+0.000000210","lowerBound":"+0.000000170"},"type":"quant'
 b'ity"}]}')

What should have happened instead?:
'amount' value should be smaller by factor 100:

{'results': [{'raw': '1.90e-9+-0.20e-9',
              'type': 'quantity',
              'value': {'amount': '+0.00000000190',
                        'lowerBound': '+0.000000170',
                        'unit': '1',
                        'upperBound': '+0.000000210'}}]}

A direct api call from the uri given above did it right:
uri is https://www.wikidata.org/w/api.php?action=wbparsevalue&datatype=quantity&values=1.90e-9%2B-0.20e-9&options=%7B%7D&maxlag=5&format=jsonfm and the result is:

{
    "results": [
        {
            "raw": "1.90e-9+-0.20e-9",
            "value": {
                "amount": "+0.00000000190",
                "unit": "1",
                "upperBound": "+0.00000000210",
                "lowerBound": "+0.00000000170"
            },
            "type": "quantity"
        }
    ]
}

Software version :

D:\>python
Python 3.10.2 (tags/v3.10.2:a58ebcc, Jan 17 2022, 14:12:15) [MSC v.1929 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>>
D:\>pip freeze
certifi==2019.6.16
chardet==3.0.4
idna==2.8
mwparserfromhell==0.6.4
requests==2.22.0
urllib3==1.25.3

setuptools==57.0.0

I also checked it with these packages:

D:\>pip freeze
certifi==2019.6.16
chardet==3.0.4
idna==2.8
regex==2022.7.9
requests==2.22.0
urllib3==1.25.3
wcwidth==0.2.5
wikitextparser==0.47.4

setuptools==57.0.0

Other information:
It worked with Python 3.7.3 and the (last) packages given above.

Event Timeline

Digging deeper in this issue I found out that replacing dot with comma gives the expected result:

>>> result = site.parsevalue('quantity', ['1,90e-9+-0,20e-9'], {}, False)
>>> result
[{'amount': '+0.00000000190', 'unit': '1', 'upperBound': '+0.00000000210', 'lowerBound': '+0.00000000170'}]

The uri was https://www.wikidata.org/w/api.php?action=wbparsevalue&datatype=quantity&values=1%2C90e-9%2B-0%2C20e-9&options=%7B%7D&maxlag=5&format=json but using this directly gives the wrong result too:

{
    "results": [
        {
            "raw": "1,90e-9+-0,20e-9",
            "value": {
                "amount": "+0.000000190",
                "unit": "1",
                "upperBound": "+0.000000210",
                "lowerBound": "+0.000000170"
            },
            "type": "quantity"
        }
    ]
}

I also checked headings and other sessions parameters for Python 3.10.2 and 3.7.3 and didn't found any difference. And last I've deleted the cookie file pywikibot.lwb which solves this issue.

But where is the trick? And how can we detect that there is something going wrong?

The problem was found upstream:
I found out this strange behavior depends on the bot user global (maybe also local) Internationalisation language settings and may occur if the language is other than 'en'. I guess this not intentional for an API request but a bug.

Yes, it probably depends on the language. When you edit e.g. dates on Wikidata manually, it will try to parse the input in your language (or English). I think it can be changed using uselang=.

So maybe we should always force parsing in English and add an argument to allow overriding this.

Change 812872 had a related patch set uploaded (by Xqt; author: Xqt):

[pywikibot/core@master] [IMPR] Always set uselang=en for parsing parsevalue

https://gerrit.wikimedia.org/r/812872

Yes, it probably depends on the language. When you edit e.g. dates on Wikidata manually, it will try to parse the input in your language (or English). I think it can be changed using uselang=.

So maybe we should always force parsing in English and add an argument to allow overriding this.

Great idea. Thanks you.

Change 812872 merged by jenkins-bot:

[pywikibot/core@master] [IMPR] Always set uselang=en for parsing parsevalue

https://gerrit.wikimedia.org/r/812872

Xqt claimed this task.