Page MenuHomePhabricator

rdf:type of statement in WDQS seems to be missing
Closed, InvalidPublic

Description

I have a hard time understanding the ontology of the properties as appearing in the Wikidata Query Service (WDQS) in connection with the statements. In the nt generated from the wikidata.org website I get:

<http://www.wikidata.org/entity/statement/Q80-f415fcf7-4fec-59f7-7793-e71aa5874323> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://wikiba.se/ontology-beta#Statement> .
<http://www.wikidata.org/entity/statement/Q80-f415fcf7-4fec-59f7-7793-e71aa5874323> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://wikiba.se/ontology-beta#BestRank> .

This is in accordance with the documentation at https://www.mediawiki.org/wiki/Wikibase/Indexing/RDF_Dump_Format#Statement_types

But using the WDQS with

SELECT ?p1 ?value 
WHERE {  <http://www.wikidata.org/entity/statement/Q80-f415fcf7-4fec-59f7-7793-e71aa5874323> ?p1 ?value . }

I get rdf:type wikibase:BestRank but not wikibase:Statement. I would have expected rdf:type wikibase:Statement too. When I check the data on Wikidata.org it is normal ranked https://www.wikidata.org/wiki/Q80#P734. So furthermore the "best rank" seems strange to me.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Yet another related issue is SELECT * WHERE { wd:Q80 a ?object . } should give a wikibase:Item returned if I understand the documentation correctly here https://www.mediawiki.org/wiki/Wikibase/Indexing/RDF_Dump_Format#Entity_representation with its example of wd:Q3 a wikibase:Item ;.

Curious whether anyone else thinks this would this be worth revisiting. I cam across this whilst looking into why the RDF in BlazeGraph seemed to be lacking rdf:type where as the RDF exports included it. Some of the other items mentioned on the data differences page are cases of redundant information, however rdf:type is not easily substituted with anything else currently put into BlazeGraph. So if you are trying to target a SPARQL query to specific kinds of elements in the graph this isn't easy (if at all possible).

Performance was mentioned several times as a reason for not including the elements and I wonder if that would still be considered a valid reason for not adding some of them.

Maybe we could revisit this on another query engine (T206560), but I think the basic facts haven’t changed: it’s a lot of extra triples with little added value.

Here’s a list of workarounds in case it’s useful:

  • ?item a wikibase:Item?item wikibase:sitelinks []
  • (?property a wikibase:Property isn’t removed)
  • ?lexeme a wikibase:Lexeme?lexeme a ontolex:LexicalEntry
  • ?form a wikibase:Form?form a ontolex:Form
  • ?sense a wikibase:Sense?sense a ontolex:LexicalSense
  • ?statement a wikibase:Statement?statement wikibase:rank []
  • ?reference a wikibase:Reference is the only one I don’t have a direct replacement for; best approach is probably [] prov:wasDerivedFrom ?reference but you’ll have to take into account that this may produce multiple solutions if there are several statements with the same reference

Yes I'd seen the ongoing investigation in to other engines and like you say it may well be addressable there. I just wondered if some changes had occured to BlazeGraph in the interim.

Thank you for those workarounds, very helpful. i'd overlooked the sitelinks count on item (still getting familiar with every detail), i'll give this a try along with a few of the others for some of the query patterns i was looking at.