Page MenuHomePhabricator

Unexpected title value in WDQS when using MWAPI for lexemes
Open, Needs TriagePublicBUG REPORT

Description

List of steps to reproduce (step by step, including full links if applicable):

SELECT ?title WHERE {
  hint:Query hint:optimizer "None".
  SERVICE wikibase:mwapi {
     bd:serviceParam wikibase:endpoint "www.wikidata.org";     # Set the project here
                     wikibase:api "Generator";
                     mwapi:generator "recentchanges";        
                     mwapi:grcnamespace "146";                 # Restrict to lexemes
                     mwapi:grctype "new";
                     mwapi:grcuser "Ainali";.                  # Your username
     ?title wikibase:apiOutputItem mwapi:title. 
  } 
FILTER BOUND (?title)  
} LIMIT 1

What happens?:
It gives a result which in JSON looks like this:

{"head":{"vars":["title"]},"results":{"bindings":[{"title":{"type":"uri","value":"http://www.wikidata.org/entity/LEXEME:L496172"}}]}}

and the query service renders that as wd:LEXEME:L496172 which is a link that resolves to https://www.wikidata.org/wiki/Lexeme:LEXEME:L496172
which gives a 404 with the error message "This entity does not exist. "

What should have happened instead?:
I would have expected that title would either have the value of http://www.wikidata.org/entity/L496172 or that it would be a string with the value of Lexeme:L496172 (like in a regular API-call).

Event Timeline

Well, I would say it’s not really a bug, more an unfortunate combination of features. There are currently two ways to get an entity ID out of MWAPI:

Given a “client-like” API result like this

<page _idx="32882493" pageid="32882493" ns="0" title="Mary E. Cobb" contentmodel="wikitext" pagelanguage="en" pagelanguagehtmlcode="en" pagelanguagedir="ltr" touched="2021-06-28T08:26:03Z" lastrevid="1030833675" length="7412">
  <pageprops defaultsort="Cobb, Mary E." page_image_free="Mary_E._Cobb.jpg" wikibase_item="Q6779361" />
</page>

then the MWAPI config

?item wikibase:apiOutputItem mwapi:item.

corresponds to the XPath pageprops/@wikibase_item, i.e. the wikibase_item attribute of the pageprops child element of that page element.

Given a “repo-like” API result like this

<page _idx="102358522" pageid="102358522" ns="0" title="Q107110037" contentmodel="wikibase-item" pagelanguage="en" pagelanguagehtmlcode="en" pagelanguagedir="ltr" touched="2021-06-08T03:16:28Z" lastrevid="1437430459" length="1963">
  <pageprops wb-claims="3" wb-identifiers="0" wb-sitelinks="1" />
</page>

then the MWAPI config

?item wikibase:apiOutputItem mwapi:title.

corresponds to the XPath @title, i.e. the title attribute of that page element.

In both cases, the assumption is that the XPath selects the entity ID as a string, but for the title attribute, that’s only true for items, not other entity types. (On Wikibase installations where items aren’t in the main namespace, I assume it’s not even true for items.) But I’m not sure what the best way to solve this would be – I would be hesitant to hard-code something like “remove everything up to the first colon” into MWAPI.

Maybe we can add a new value for the query+prop API parameter, which would add the page’s entity ID (as an attribute on the page element, like the contentmodel, pagelanguage, etc.)? And then we could add that by default to the MWAPI “Generator” service, and add a new shortcut, mwapi:entityid = @wikibase_entity or something like that.