The Wikidata api provides a function to parse a string and to try return some datatype. See https://www.wikidata.org/w/api.php?action=help&recursivesubmodules=1#wbparsevalue . We should probably have a wrapper function for this in the site object so we can outsource some of the string parsing to that instead of re-inventing the wheel ourselves.
Description
Details
Subject | Repo | Branch | Lines +/- | |
---|---|---|---|---|
Add wrapper function around wbparsevalue | pywikibot/core | master | +89 -1 |
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | Feature | matej_suchanek | T66503 Add datatype time to harvest_template.py for importing dates | ||
Open | None | T112141 Update harvest_template.py to use wbparsevalue and accept arbitrary datatypes | |||
Resolved | Xqt | T112140 Provide a wrapper function in pywikibot around wbparsevalue |
Event Timeline
I'm going to have a shot at implementing this - it looks like it will be useful for a number of other open phabricator issues for pywikibot. I was figuring a function that will take all the parameters the API offers (datatype - a string, values - a list of strings, options - a dict, validate - boolean). Any other recommendations?
The function should return an object. Possibilities seem to be commonsMedia, globe-coordinate, monolingualtext, quantity, string, time, url, external-id, wikibase-item, wikibase-property, math
>>! In T112140#2435122, @Multichill wrote:
The function should return an object. Possibilities seem to be commonsMedia, globe-coordinate, monolingualtext, quantity, string, time, url, external-id, wikibase-item, wikibase-property, math
The parse API allows a list of values to be parsed (not just one at a time), and I have written the function to return a list of the parsed "objects" in the form of just the value (for strings) or a python dict with the keys and values supplied by the wbparsevalue api. In particular, parsing a list of quantity values returns a list of dicts with the keys 'amount', 'upperBound', 'lowerBound', and possibly other keys as provided by the API (for example it returns 'unit' with value '1').
Unfortunately, some existing pywikibot classes like WbQuantity or WbTime do not work for this because they use Decimal or long/int rather than string values (however it looks like WbMonolingualText could work). I thought making that change to the classes ought to be a separate step to providing access to the parse api.
I don't get it. https://www.wikidata.org/w/api.php?action=wbparsevalue&datatype=time&validate&values=1994-02-08 output seems suitable to build a WbTime object. with https://phabricator.wikimedia.org/diffusion/PWBC/browse/master/pywikibot/__init__.py;b6e933d4e33e1035457755015b29da74b8c8160a$512 . What's wrong with that?
@Multichill - could be, I'm not familiar with WbTime other than a glance at the code. Are there edge cases (eg. 10^20 years into the future?) that would break the "int/long" assumptions? But it definitely does NOT work for WbQuantity the way things currently are. Fixing WbQuantity seemed to be out of scope here, though it does need to be done. Coordinate may have similar issues as it uses floats.
From pywikibot/page.py description of the Property class, the actual classes involved in the different object types are:
- ItemPage or PropertyPage
- basestring
- FilePage
- Coordinate
- WbTime
- WbQuantity
- WbMonolingualText
The parser doesn't seem to do anything special for wikibase-item types; creating the object would I think involve another separate API query. It seems to me each case would need to be handled at least a little differently.
One route would be a separate function to turn the parsed values into pywikibot objects - for example a special constructor for WbTime that takes the parsed values.
Item:
*https://www.wikidata.org/w/api.php?action=wbparsevalue&datatype=wikibase-item&values=Q42
Property:
- Checks if it exists so https://www.wikidata.org/w/api.php?action=wbparsevalue&datatype=commonsMedia&validate&values=Iguana_marina_(Amblyrhynchus_cristatus),_Las_Bachas,_isla_Baltra,_islas_Gal%C3%A1pagos,_Ecuador,_2015-07-23,_DD_24.JPG gives an error
Coordinate:
Etc, all the output we get seem and should match the constructors we have for the objects.
This only makes sense for data types that are subclasses of WbRepresentation
https://phabricator.wikimedia.org/diffusion/PWBC/browse/master/pywikibot/_wbtypes.py%3Bb6e933d4e33e1035457755015b29da74b8c8160a . These classes all have a fromWikibase function.
So the logic should probably be:
- DataSite.parsevalue(datatype, value, options, validate)
- Check that datatype is the name of a subclass of " WbRepresentation"
- Throw it to the server
- If the result is ok, return <subclass we found>.fromWikibase(<the json>)
Something like that?
This way, if a new datatype gets added, this part of the code doesn't need to change.
Ok, the WbRepresentation superclass looks like it might help simplify this. But FilePage, ItemPage and PropertyPage (and basestring) are not subclasses of that, so I think just returning the json hash would be best there. But the function could certainly run fromWikibase for the other types, that seems pretty easy, I'll look into that.
Change 297637 had a related patch set uploaded (by Lokal Profil; owner: ArthurPSmith):
[pywikibot/core@master] Add wrapper function around wbparsevalue
I might pick this up if nobody else beats me to it (go ahead). I actually have been using this a lot in my bots and it works quite well.
Change 297637 merged by jenkins-bot:
[pywikibot/core@master] Add wrapper function around wbparsevalue