Page MenuHomePhabricator

RDF export generates wrong IDs for federated entities
Closed, ResolvedPublic

Description

When generating RDF for MediaInfo entity, I get this:

@prefix wd: <https://federated-commons.wmflabs.org/entity/> .
@prefix wdt: <https://federated-commons.wmflabs.org/prop/direct/> .
@prefix p: <https://federated-commons.wmflabs.org/prop/> .


wd:M10694 a wikibase:Mediainfo,
		schema:MediaObject ;
	schema:caption "Tacoma"@en ;
	rdfs:label "Tacoma"@en ;
	wdt:P64 wd:Q224 ;
	p:P64 wds:M10694-0934c1f0-42ee-760c-ebfe-06c0a698b3d6 .

wds:M10694-0934c1f0-42ee-760c-ebfe-06c0a698b3d6 a wikibase:Statement,
		wikibase:BestRank ;
	wikibase:rank wikibase:PreferredRank ;
	ps:P64 wd:Q224 ;
	pq:P80 wd:Q260 .

However, Q3349 is federated Wikidata entity, it should not have wd: prefix but wd-wikidata: one instead. Same for P64 which is also federated.

So we have the following problems here:

  • Q-items and M-items have the same prefix
  • properties and Q-items use commons prefix, while they need to use Wikidata prefix
  • we want commons prefix to be sdc, not wd (this is the least important part, we can start with any prefix, but not before the above is fixed, since it uses wrong URLs)

Related Objects

Event Timeline

Looking at the code, the ID is plain P64, even though it is coming from different repository, but the ID returns empty string as repository name. So I am confused about how exactly this is supposed to work. @Lucas_Werkmeister_WMDE, @WMDE-leszek - any guidance on this?

Smalyshev triaged this task as Medium priority.May 1 2019, 8:07 PM

@Smalyshev I believe the issue you have noticed is due to RDF export being not fully operational on Commons. We (WMDE) have worked on the assumption RDF output for MediaInfo things is not yet needed, and hence did not prioritize RDF topics in this area.
I will revisit this all next week. Possible good news, that most of the implementation has been done already a few weeks ago, what is left is more finishing touches,so I hope to report back on issue being fixed here soon.

Quickly looking over this ticket:

However, Q3349 is federated Wikidata entity, it should not have wd: prefix but wd-wikidata: one instead. Same for P64 which is also federated.

Within wikibase itself the whole idea of prefixes currently does not exist / we removed it.

Right now any prefixless Q items should be assumed to live on wikidata.org, same for prefixless L and P entities. As for prefixless M entities, these should be assumed to live on commons.wikimedia.org.

If we look to the future, where for example we might want properties on multiple wikis, I think the easiest way for this to happen would be that unprefixed P on any wikimedia wiki would still always refer to properties on wikidata, and if siteX also then wanted properties, then these could be prefixed everywhere, so siteX:P123 for example.

I say all of this because in this case I think I disagree with the sentence I quoted, as perhaps Q3349 should always have the wd: prefix? and if another site ever has extra items, that would have a different prefix?

Well, the real problem is not prefixes but that we're using here wd: for both M-entities and Q-entities, despite the former needing URIs starting with http://commons.wikimedia.org/entity and the latter http://www.wikidata.org/entity/. So the main problem is not the prefix but underlying URI, and the prefix is only the expression of it - these two types of entities can not have the same prefix and same URI space. Now, how we express it is the tricky part. We can of course define wd: as always being Wikidata prefix (see T222995) but we need to figure out how to configure these prefixes and URIs so that they point to the correct namespaces.

FWIW, the PoC patch I've put together a while ago that should "fix" this is https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/Wikibase/+/489447/.
I am working my way to get the patch reviewable again

@Smalyshev I've finally rebased https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/Wikibase/+/489447/ to be able to try this out.
I am setting up WikibaseMediaInfo extension locally now, but maybe you'd have a chance to give it a try before that.

According to T222995, this is what we want to get:

@prefix wd: <http://www.wikidata.org/entity/> .
@prefix wdt: <http://www.wikidata.org/prop/direct/> .
@prefix p: <http://www.wikidata.org/prop/> .
@prefix ps: <http://localhost/prop/statement/> .
@prefix sdc: <http://commons.wikimedia.org/entity/> .
@prefix sdcds: <http://commons.wikimedia.org/entity/statement/> .

sdc:M10659 a wikibase:Mediainfo,
		wikibase:MediaInfo,
		schema:MediaObject ;
	schema:caption "(Mostly) empty road near Zion National Park, at dusk"@en ;
	rdfs:label "(Mostly) empty road near Zion National Park, at dusk"@en ;
	skos:prefLabel "(Mostly) empty road near Zion National Park, at dusk"@en ;
	schema:name "(Mostly) empty road near Zion National Park, at dusk"@en ;
	wdt:P64 wd:Q3349 ;
	p:P64 sdcs:M10659-9d7a22d3-46af-89b8-8e51-41a8a47c009f .

sdcs:M10659-9d7a22d3-46af-89b8-8e51-41a8a47c009f a wikibase:Statement,
		wikibase:BestRank ;
	wikibase:rank wikibase:NormalRank ;
	ps:P64 wd:Q3349 .

Change 527149 had a related patch set uploaded (by WMDE-leszek; owner: WMDE-leszek):
[mediawiki/extensions/Wikibase@master] Added prefixes to document (data) URIs in the RDF output

https://gerrit.wikimedia.org/r/527149

Change 517021 had a related patch set uploaded (by Smalyshev; owner: WMDE-leszek):
[mediawiki/extensions/Wikibase@master] Prefix RDF (turtle) namespaces with configurable prefixes instead of using a source/repository suffix

https://gerrit.wikimedia.org/r/517021

Change 528519 had a related patch set uploaded (by WMDE-leszek; owner: WMDE-leszek):
[mediawiki/extensions/Wikibase@master] Prefix statement, reference and value namespaces in RDF output

https://gerrit.wikimedia.org/r/528519

Change 517021 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] Prefix RDF (turtle) namespaces with configurable prefixes instead of using a source/repository suffix

https://gerrit.wikimedia.org/r/517021

Smalyshev claimed this task.

Change 527149 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] Added prefixes to document (data) URIs in the RDF output

https://gerrit.wikimedia.org/r/527149

Change 528519 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] Prefix statement, and reference namespaces in RDF output

https://gerrit.wikimedia.org/r/528519