Page MenuHomePhabricator

Provide a wrapper function in pywikibot around wbparsevalue
Closed, ResolvedPublic

Description

The Wikidata api provides a function to parse a string and to try return some datatype. See https://www.wikidata.org/w/api.php?action=help&recursivesubmodules=1#wbparsevalue . We should probably have a wrapper function for this in the site object so we can outsource some of the string parsing to that instead of re-inventing the wheel ourselves.

Event Timeline

Multichill raised the priority of this task from to Needs Triage.
Multichill updated the task description. (Show Details)
Multichill subscribed.
ArthurPSmith subscribed.

I'm going to have a shot at implementing this - it looks like it will be useful for a number of other open phabricator issues for pywikibot. I was figuring a function that will take all the parameters the API offers (datatype - a string, values - a list of strings, options - a dict, validate - boolean). Any other recommendations?

I'm going to have a shot at implementing this - it looks like it will be useful for a number of other open phabricator issues for pywikibot. I was figuring a function that will take all the parameters the API offers (datatype - a string, values - a list of strings, options - a dict, validate - boolean). Any other recommendations?

The function should return an object. Possibilities seem to be commonsMedia, globe-coordinate, monolingualtext, quantity, string, time, url, external-id, wikibase-item, wikibase-property, math

>>! In T112140#2435122, @Multichill wrote:

The function should return an object. Possibilities seem to be commonsMedia, globe-coordinate, monolingualtext, quantity, string, time, url, external-id, wikibase-item, wikibase-property, math

The parse API allows a list of values to be parsed (not just one at a time), and I have written the function to return a list of the parsed "objects" in the form of just the value (for strings) or a python dict with the keys and values supplied by the wbparsevalue api. In particular, parsing a list of quantity values returns a list of dicts with the keys 'amount', 'upperBound', 'lowerBound', and possibly other keys as provided by the API (for example it returns 'unit' with value '1').

Unfortunately, some existing pywikibot classes like WbQuantity or WbTime do not work for this because they use Decimal or long/int rather than string values (however it looks like WbMonolingualText could work). I thought making that change to the classes ought to be a separate step to providing access to the parse api.

@Multichill - could be, I'm not familiar with WbTime other than a glance at the code. Are there edge cases (eg. 10^20 years into the future?) that would break the "int/long" assumptions? But it definitely does NOT work for WbQuantity the way things currently are. Fixing WbQuantity seemed to be out of scope here, though it does need to be done. Coordinate may have similar issues as it uses floats.

From pywikibot/page.py description of the Property class, the actual classes involved in the different object types are:

  • ItemPage or PropertyPage
  • basestring
  • FilePage
  • Coordinate
  • WbTime
  • WbQuantity
  • WbMonolingualText

The parser doesn't seem to do anything special for wikibase-item types; creating the object would I think involve another separate API query. It seems to me each case would need to be handled at least a little differently.

One route would be a separate function to turn the parsed values into pywikibot objects - for example a special constructor for WbTime that takes the parsed values.

Item:
*https://www.wikidata.org/w/api.php?action=wbparsevalue&datatype=wikibase-item&values=Q42

Property:

FilePage:
*https://www.wikidata.org/w/api.php?action=wbparsevalue&datatype=commonsMedia&validate&values=Iguana_marina_(Amblyrhynchus_cristatus),_Las_Bachas,_isla_Baltra,_islas_Gal%C3%A1pagos,_Ecuador,_2015-07-23,_DD_23.JPG

Coordinate:

https://phabricator.wikimedia.org/diffusion/PWBC/browse/master/pywikibot/__init__.py;b6e933d4e33e1035457755015b29da74b8c8160a$304

Etc, all the output we get seem and should match the constructors we have for the objects.

This only makes sense for data types that are subclasses of WbRepresentation
https://phabricator.wikimedia.org/diffusion/PWBC/browse/master/pywikibot/_wbtypes.py%3Bb6e933d4e33e1035457755015b29da74b8c8160a . These classes all have a fromWikibase function.

So the logic should probably be:

  • DataSite.parsevalue(datatype, value, options, validate)
  • Check that datatype is the name of a subclass of " WbRepresentation"
  • Throw it to the server
  • If the result is ok, return <subclass we found>.fromWikibase(<the json>)

Something like that?

This way, if a new datatype gets added, this part of the code doesn't need to change.

Ok, the WbRepresentation superclass looks like it might help simplify this. But FilePage, ItemPage and PropertyPage (and basestring) are not subclasses of that, so I think just returning the json hash would be best there. But the function could certainly run fromWikibase for the other types, that seems pretty easy, I'll look into that.

Change 297637 had a related patch set uploaded (by Lokal Profil; owner: ArthurPSmith):
[pywikibot/core@master] Add wrapper function around wbparsevalue

https://gerrit.wikimedia.org/r/297637

Xqt triaged this task as Medium priority.Nov 30 2018, 11:15 AM

Unassigning, I'm not working on this any more!

I might pick this up if nobody else beats me to it (go ahead). I actually have been using this a lot in my bots and it works quite well.

Xqt claimed this task.

Change 297637 merged by jenkins-bot:

[pywikibot/core@master] Add wrapper function around wbparsevalue

https://gerrit.wikimedia.org/r/297637