Page MenuHomePhabricator

[Story] switch default rdf format to full (include statements)
Closed, ResolvedPublic

Description

Currently the RDF output that includes statements is only available via a format switch: https://www.wikidata.org/wiki/Special:EntityData/Q42.ttl?flavor=full
It should be the default.

Related Objects

Event Timeline

Lydia_Pintscher raised the priority of this task from to High.
Lydia_Pintscher updated the task description. (Show Details)

Hi Lydia (and all), it's Chiara from the BBC here. I post here to allow everyone working on this issue to comment back to me. To make it brief, any idea of when this task is going to be implemented? Thanks!

We're working on the blockers but it's not clear when we'll finish them, sorry.

We want to have the mapping mostly stable when we turn this on per default. For this, it would be very helpful to get feedback about the RDF data you can already get with flavor=full.

Thanks Daniel, my colleague Alex and I will come up with some tests to evaluate the RDF data. Is there any question you might want to investigate?

Dear all,
I've just sent an email with our comments to Lydia, thanks for your patience!
Cheers,
Chiara

Lydia_Pintscher renamed this task from switch default rdf format to full (include statements) to [Story] switch default rdf format to full (include statements).Aug 13 2015, 8:11 PM
Lydia_Pintscher set Security to None.

One the mailing list, Stas brought up the question "which RDF" should be delivered by the linked data URIs by default. Our dumps contain data in multiple encodings (simple and complex), and the PHP code can create several variants of RDF based on parameters now.

I think the default should be to simply return all data that is in the dumps. This would address the BBC's use case of building a linked data crawler that fetches live data rather than using dumps. Such a crawler would not have any way to specify which part of RDF is needed, since linked data is such an extremely simple, parameter-free API.

Dump format however does not contain the data on the referenced entities (due to the fact that dump has all entities anyway, so no reason to repeat), while full one does. Not sure if that fits the use case mentioned or not.

Data on the referenced entities does not have to be included as long as one can get this data by resolving these entities' URIs. However, some basic data (ontology header, license information) should be in each single entity export.

I believe that "stub" data on referenced entities should be included per default, for convenience. That's also how the feature was originally speced with Denny.

Including more data (within reason) will not be a problem (other than a performance/bandwidth problem for your servers).

However, if there are further ideas and small improvements that will take time to implement, it would be good to switch to "dump" as the default right now. It is already a big improvement over the current (statement-free) default. Further improvements can then be done in small steps.

Change 242492 had a related patch set uploaded (by Hoo man):
Set the default flavor to full in EntityDataSerializationService

https://gerrit.wikimedia.org/r/242492

@mkroetzsch IIRC, the "dump" mode does not include information about referenced entities, which makes it inconvenient for third parties. And it doesn't resolve redirects, which violates the same-as semantics. "dump" mode should really only be used for dumps.

Change 242492 merged by jenkins-bot:
Set the default flavor to full in EntityDataSerializationService

https://gerrit.wikimedia.org/r/242492

hoo claimed this task.
hoo removed a project: Patch-For-Review.

Please note that this is not going to be deployed before October 14 (but possibly later).