Page MenuHomePhabricator

more convenience functions for Lua
Open, MediumPublic

Description

The existing Lua functions we provide in Wikibase are very limited. Client projects are writing their own modules that duplicate functionality between the clients needlessly. We should offer a number of convenience functions for often-used operations.

Feedback round currently running here

Ideas for convenience functions:

  • ?
  • T185557 Create the easy function mw.wikibase.property('P21', 'Q8023', 'en')

Related Objects

Event Timeline

Lydia_Pintscher created this task.

@thiemowmde @hoo @eranroz @Tpt @Ladsgroup: You've been thinking about this for a while and talked to a number of people. Can you add the ones you believe are most important to the description? (Ideally with links to tasks if they exist.) Thank you!

thiemowmde moved this task from incoming to needs discussion or investigation on the Wikidata board.
thiemowmde added subscribers: aude, Jonas.
  • getAllStatements, both as an independent function (T176124), as well as on Entity objects (T166056).
    • Use-case: Maintenance code that intentionally checks all statements, including deprecated.
  • It was suggested to specify what the Entity returned by getEntity contains (T179638). For example, the user knows he needs only some specific statements, but not all of them. He wants to provide a list of these property IDs.
  • It was suggested to have a filter option for the existing getBestStatements that filters out no-value and some-value snaks.
    • Use case: Code skipping snaks with no actual value is repeated a lot in community-maintained code. Or they forget this edge-case and stuff breaks when it runs into a no/some-value snak.
  • A filter for getBestStatements that only returns statements with a specific reference.
    • Use case: When wikis can't decide on a preferred statement, but want to use their own preferred source.
  • A boolean entityExists (T143970).
    • Use case: Currently, I see a lot of code that does if getEntity( … ) then, which is super-expensive for no reason. The cheapest workaround that currently exists is getEntityUrl, but thats awkward to use in an if.
  • Some kind of boolean isChildOf or hasParent that recurses a tree up (T179155).
    • One parameter of the function is a list of properties that describe parent-relations, typically "subclass of" and "instance of". Ideally this should be a list and not only one property, but this does have a huge disadvantage: It's not a linear recursion then, but a tree! Instead of a maximum-depth, the algorithm must enforce a maximum on the number of entities processed.
    • The recursion stops when a specific item from a list of possible parents is found. Specifying only one parent item is definitely not enough!
    • The recursion also stops when it realizes it runs into a loop, or already processed the maximum number of entities. Personally, I don't think we should allow the user to specify their own maximum. Just decide on a sensible default.
    • Use case: Maintenance code that checks if all actors are (if you walk up the tree) properly tagged as "actor", without the need to list subclasses individually.
  • Some kind of recursive getParent that returns the entire parent entity found.
    • Use case: In the infobox of a city, I want to display information from the country, e.g. a link to the countries article, and the international dialing code. But cities don't necessarily have the "country" (P17) property. But I know I can traverse "located in the administrative territorial entity" (P131) up and stop at an item that is an "instance of" (P31) "country" (Q6256). As before, I might need to provide multiple parent-relations, as well as multiple conditions to identify a parent.

all points raised by @thiemowmde are very good points. in addition:

  • get statements with language filter - very common for example to get female label (P2521) in specific lang. (see for example the nice work of @putnik in ru:Модуль:WikidataSelectors and Wikidata/item). Such properties are usually heavy (for example ~30 values in Q36180) but the module is interested in only one of them, so this can make it much easier and may even come with a little performance benefits.
  • A boolean entityExists (T143970).
    • Use case: Currently, I see a lot of code that does if getEntity( … ) then, which is super-expensive for no reason. The cheapest workaround that currently exists is getEntityUrl, but thats awkward to use in an if.

I actually like this idea but it should be noted that it currently does not actually work as getEntityUrl(eid) will happily return URLs for invalid entities. It could perhaps be changed to support such by checking if the entity actually exists and returning nil when it does not.

I do find it ironic that I can use getAllStatements(eid, pid) to get various claims on multiple entities without incurring an "expensive" hit but I get an "expensive" hit if I want to check if the same set of entities actually exist or not because I currently have to use getEntity(eid) (not to mention the resulting error message is hard to machine process).

The issues with more claim filtering and/or the "parent" cases mentioned above seem more applicable to a more generalized query interface as such things are actually quite simple to specify with SPARQL.

Currently without pulling the entire entity object with getEntity, we have some filtered access to the following: labels, descriptions, claims, and sitelinks. It should be noted there currently is no way to obtain any aliases data whatsoever without pulling the entire entity object.

In terms of wishlists, I would like to see something along the lines of a getProperties(eid) that lets me get a list of available property claims on an entity without having to pull the entity object with code like pids = getEntity(eid):getProperties(). An it would be cool if I could then send the same list through orderProperties(pids).

In short, my current wishlist items are:

  1. entity exists functionality without getEntity and the error it yields; it should handle Wikidata redirects from merged items, etc. perhaps something like resolvePropertyId but for mapping an eid to another or nil
  2. other ways to obtain entity IDs, e.g., T74815/T135442 (by reverse sitelink) and T99899 (reverse external identifier query)
  3. property list functionality without getEntity (ala getProperties)
  4. language filtered aliases access functionality without getEntity (ala getLabelWithLang and getLabelByLang)
  5. a getDescriptionByLang to align with getLabelByLang

I'm personally missing ability to get "what links here" (see T185313)

I miss functions dealing with grammatical cases. For example an immediate query to https://www.wikidata.org/wiki/Q37459 returns Nicole Kidman is an actor. Two more queries are necessary to fetch property 21 (sex or gender) and property 2521 (female form of label) from the corresponding item to reach the correct label as "actress". Since it is a common need, not resolved in many of the local Lua modules, it would be nice to solve it directly in a single query.

The reason why different projects develop different modules is that different communities have different demands. Working on the English Wikipedia has led me to alter the original line of coding that I developed early on.

I think it's important to examine the demand for functionality in order to determine which functions would be most valuable. On en-wp there is already an acceptance that any locally supplied value must override whatever may exist on Wikidata. That may be easiest to code within a module. There is also consensus that Wikidata may be used within infoboxes, but not elsewhere (with minor exceptions). See https://en.wikipedia.org/wiki/Wikipedia:Requests_for_comment/Wikidata_Phase_2

Beyond that, the two demands that are strongest are that information pulled from Wikidata must be sourced (not just "Imported from xyz Wikipedia), and that the use of Wikidata on any article must be able to be determined on an article-by-article basis - in other words, the fetching of Wikidata into an infobox has be enabled by a conscious decision on each article. See https://en.wikipedia.org/wiki/Wikipedia:Village_pump_(policy)/Archive_128#RfC:_Wikidata_in_infoboxes,_opt-in_or_opt-out? and https://en.wikipedia.org/wiki/Template_talk:Infobox_book/Archive_8

The code to implement a whitelist (a flexible way to create "opt-in" infoboxes) is probably best left to the module.

There is code to meet those requirements in https://en.wikipedia.org/wiki/Module:WikidataIB but it would be valuable to have "smart" utility functions to encapsulate some of the common jobs such as returning a table of formatted wikitext, (taking into account links to redirects (should be linked) and dab pages (should not be linked)), as well as handling all date values according to precision and era, ranges of quantities, etc. In each case, the default needs to be to return only properly sourced values (with the option to return all values by setting a parameter).

It would also be appreciated if a similar generalised function could retrieve values that are stored on Wikidata as values of a given qualifier i.e. value of property1 -> qualifier -> property2 as there is little consistency on how properties are stored.

Similarly, there is a demand for a function that scans a property prop1 in the current page (or another page if qid is given) and for each value of the property that is a wikibase item, it fetches all of the values of prop2; then for each value of prop2 it retrieves each qualifier and its returns its value. That's potentially a hugely expensive function as it has to load multiple entities not directly associated with the current page using arbitrary access. There's some code to do that at the getValueQualIndirect function in https://en.wikipedia.org/wiki/Module:RexxS but I'm loathe to make it more generally available unless the overhead can be significantly reduced. Something for thought by the developers.

  • At Wikivoyage (Modul:FastWikidata) we introduced some helpful and simple functions with support of @thiemowmde.
  • Searching for parent entities along a P31-P279 chain. Proposal: typeSearch( childId, idArray, limit ). childId is the entity id (or entity) to start the search. idArray contains P31 or P279 ids to look for. limit gives the maximum count of parent levels to search. typeSearch gives nil or a q id from the array. We need this search to get more general instances or classes. For instance: Q320366: we want to know that it is a train station (Q55488) instead of an interchange station (Q1147171).
  • A similar question is to ask for a state (see above) or first administrative division starting from any location.
  • We need a function providing an array with the first, n or all properties together with its qualifier values or ids. The qualifier itself is known. getPropertyWithQualifyer( id, p, q ). The result could be an array with items consisting of the value and a qualifier values array.

At eswiki we mostly use it for the infoboxes and there are some things we have to do in many infoboxes that could be useful for others too:

  • Get the value of the qualifier of a known specific value for a property: For example, when we want to get the title in Spain for a given film, we need the value of the title qualifier for the claim has quality -> title for Spain (entity['claims']['P1552'] -> check if any of them has the value Q27847754 -> ['qualifiers']['P1476'][1])
  • Get the value of a property with a given qualifier and value: For example, when we want to display the different crew members that don't have their own property in Wikidata. If we want to display the art director we have to loop through all the film crew members and check who of them (if any) has a role of art director.
  • An easy way to sort the claims by label in a given language (for alphabetically sorted lists), by qualifier value (for dates or numeric values).
  • A way to get the most recent value for a property (like for the population of the cities and towns). Usually it should be set as preferred, and sorting these claims by date at Wikidata to make setting the most recent as preferred easier would work too.

FYI, we started a feedback round on this page. All editors are welcome to give more information about the modules they use or improve, and what functions could be useful for them.

Perhaps something that allows "stated as" (P1932) values to be easily fetched? See, for example, discussion at https://en.wikipedia.org/wiki/Template_talk:Infobox_person/Wikidata#Stated_as

This is also important for author names.

It would be convenient if there were a function or variable to get the current wiki's site ID from, something like "mw.site.siteId".

On Wikipedia, I'm now trying to determine the site ID as follows, which is not a 100% reliable way:

siteId = (function() for i,v in pairs(mw.site.interwikiMap("local")) do if v.isCurrentWiki and i~="w" then return mw.ustring.gsub(i,"-","_").."wiki" end end end)()

I need this in order to get the badges of the article on the current wiki from the attached Wikidata entity object:

entity.sitelinks[siteId].badges

It would be convenient if there were a function or variable to get the current wiki's site ID from, something like "mw.site.siteId".

That is tracked in T194023: Expose siteGlobalID in Lua.

It would be great to have a getValidStatements() function that works the same as getAllStatements() but filters out deprecated statements. Deprecated Wikidata statements are often of no use outside of Wikidata and usecases like listing the children of a person outside of Wikidata shouldn't list deprecated children. Without a decidated function there are plenty fo cases where users will forgot to filter out deprecated statements and then show Wikipedia users data Wikidata knows to be false as valid data.

mw.wikibase.getAliasesByLang would be useful, so that it's not necessary to load the whole item to access the aliases.