Page MenuHomePhabricator

Add a new munge option to do blank node skolemization
Closed, ResolvedPublic

Description

This munge option will transform all blank nodes as IRIs following RDF 1.1 specs section 3.5.

_:a8d14fa93486370345412093add8f50c will become http://www.wikidata.org/.well-known/genid/a8d14fa93486370345412093add8f50c

Wikibase is expected to generate stable and unique labels for its blank node labels (https://gerrit.wikimedia.org/r/c/mediawiki/extensions/Wikibase/+/583897).

Event Timeline

dcausse created this task.Feb 18 2020, 5:30 PM

In https://www.wikidata.org/wiki/Q4115189#Q4115189$7d68afee-408d-1c1e-946b-43d8d37a17b5 @Lucas_Werkmeister_WMDE added more "somevalue" to the graph (references and qualifiers) which outputs the following graph:

wd:Q4115189 p:P370 s:Q4115189-7d68afee-408d-1c1e-946b-43d8d37a17b5 .

s:Q4115189-7d68afee-408d-1c1e-946b-43d8d37a17b5 a wikibase:Statement,
		wikibase:BestRank ;
	wikibase:rank wikibase:NormalRank ;
	ps:P370 _:genid6 ;
	pq:P2315 "this is a demo for T244341, if possible please don’t remove it before, say, 2020-02-26 :)"@en ;
	pq:P370 _:genid7 ;
	pq:P1106 _:genid8 ;
	prov:wasDerivedFrom ref:6c8b1cd1c3cd814ab99e3c40580f12024ceff994 .

ref:6c8b1cd1c3cd814ab99e3c40580f12024ceff994 a wikibase:Reference ;
	pr:P370 _:genid9 ;
	pr:P855 _:genid10 .

First constatation is that our current update strategy is not able do a clean change on this entity, existing blank nodes are leaked (updated T231515).

The proposed solution for encoding bnodes as currently stated does not work well as it will conflate all bnodes attached to a statement.

One obvious solution would to encode more information to this made-up IRI by prefixing/suffixing the predicate:

wd:Q4115189 p:P370 s:Q4115189-7d68afee-408d-1c1e-946b-43d8d37a17b5 .

s:Q4115189-7d68afee-408d-1c1e-946b-43d8d37a17b5 a wikibase:Statement,
		wikibase:BestRank ;
	wikibase:rank wikibase:NormalRank ;
	ps:P370 wdsome:Q4115189-7d68afee-408d-1c1e-946b-43d8d37a17b5-PS-P370 ;
	pq:P2315 "this is a demo for T244341, if possible please don’t remove it before, say, 2020-02-26 :)"@en ;
	pq:P370 wdsome:Q4115189-7d68afee-408d-1c1e-946b-43d8d37a17b5-PQ-P370 ;
	pq:P1106 wdsome:Q4115189-7d68afee-408d-1c1e-946b-43d8d37a17b5-PQ-P1106 ;
	prov:wasDerivedFrom ref:6c8b1cd1c3cd814ab99e3c40580f12024ceff994 .

ref:6c8b1cd1c3cd814ab99e3c40580f12024ceff994 a wikibase:Reference ;
	pr:P370 wdsome:ref-6c8b1cd1c3cd814ab99e3c40580f12024ceff994-PR-P370 ;
	pr:P855 wdsome:ref-6c8b1cd1c3cd814ab99e3c40580f12024ceff994-PR-P855 .

This is a bit ugly but this would ensure uniqueness of the IRIs, also I'm not a big fan of propagating information into IDs as I'm afraid that some process may want to make some assumptions on the structure of the ID itself. Here the only information we want to encode is the:

  • uniqueness of the node
  • a common IRI prefix to detect that these are skolem IRIs.

I wonder if we could not simply hash things. wdsome:Q4115189-7d68afee-408d-1c1e-946b-43d8d37a17b5-PS-P370 would become wdsome:e81da6d67fa0cbf0e1daf440c31cf138ffe565c8

Change 583897 had a related patch set uploaded (by DCausse; owner: DCausse):
[mediawiki/extensions/Wikibase@master] [rdf] generate stable labels for blank nodes

https://gerrit.wikimedia.org/r/583897

Maintenance_bot moved this task from incoming to in progress on the Wikidata board.Apr 1 2020, 9:15 AM

Change 585542 had a related patch set uploaded (by DCausse; owner: DCausse):
[wikidata/query/rdf@master] Add new munger option to enable blank node skolemization

https://gerrit.wikimedia.org/r/585542

Addshore added a subscriber: Addshore.

Moving to waiting on our board, as the aim will be to merge this next week

dcausse updated the task description. (Show Details)Apr 30 2020, 5:18 PM
dcausse updated the task description. (Show Details)

Change 585542 merged by jenkins-bot:
[wikidata/query/rdf@master] Add new munger option to enable blank node skolemization

https://gerrit.wikimedia.org/r/585542

Change 583897 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] [rdf] generate stable labels for blank nodes

https://gerrit.wikimedia.org/r/583897

@dcausse I don't think there is anything actionable from our side here so I'll put it in Done on our side.
I'll leave you to deal with closing this ticket / anything else that is needed

Gehel closed this task as Resolved.Jul 13 2020, 12:54 PM