Page MenuHomePhabricator

duplicated lines in all Wikidata entities with statements in LD formats
Closed, ResolvedPublic

Description

Problem: The line…

<rdf:type rdf:resource="http://wikiba.se/ontology#Property"/>

… is duplicated in all the blocks rdf:Description that describe statements of Items, Properties, Lexemes, Forms and Senses in RDF (*.rdf). Analogous lines are also duplicated in, at least, Notation3 (*.n3), Turtle (*.ttl), N-Triples (*.nt) and JSON-LD (*.jsonld).

Examples:

  • Item Q42 in RDF, first duplication out of 241 (the line appears 482 times):
	<rdf:Description rdf:about="http://www.wikidata.org/entity/P31">
		<rdf:type rdf:resource="http://wikiba.se/ontology#Property"/>
		<rdf:type rdf:resource="http://wikiba.se/ontology#Property"/>
		<wikibase:propertyType rdf:resource="http://wikiba.se/ontology#WikibaseItem"/>
		[…]
	</rdf:Description>
  • Property P101 in Notation3:
wd:P101 a wikibase:Property,
		wikibase:Property ;
  • Lexeme L2 in Turtle:
wd:P5831 a wikibase:Property,
		wikibase:Property ;
  • Form L2-F1 in N-Triples:
<http://www.wikidata.org/entity/P898> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://wikiba.se/ontology#Property> .
<http://www.wikidata.org/entity/P898> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://wikiba.se/ontology#Property> .
  • Sense L2-S1 in JSON-LD:
"@type": [
    "wikibase:Property",
    "wikibase:Property"
],

Acceptance criteria:

  • The content <rdf:type rdf:resource="http://wikiba.se/ontology#Property"/> in RDF, or its analogous in other formats, is no longer unnecessarily duplicated.

Event Timeline

@Lydia_Pintscher, do you know to which project/team this task would belong?

Change 683719 had a related patch set uploaded (by Hoo man; author: Hoo man):

[mediawiki/extensions/Wikibase@master] PropertyRdfBuilder::addProperty: Don't duplicate type

https://gerrit.wikimedia.org/r/683719

Is this problematic for any particular reason? As far as I understand, it’s generally acceptable to repeat triples in RDF (they just have no effect).

(We can still fix this, of course, I’m just curious if it’s more important for some reason I’m not aware of.)

Is this problematic for any particular reason? As far as I understand, it’s generally acceptable to repeat triples in RDF (they just have no effect).

(We can still fix this, of course, I’m just curious if it’s more important for some reason I’m not aware of.)

I don't know, I guess not, apart from the extra bytes stored, transmitted, processed, etc. Perhaps certain applications also display this information (or the result of processing it) twice. (?)

Change 683719 merged by jenkins-bot:

[mediawiki/extensions/Wikibase@master] PropertyRdfBuilder::addProperty: Don't duplicate type

https://gerrit.wikimedia.org/r/683719

abian assigned this task to hoo.

You fixed it. \o/