Page MenuHomePhabricator

Provide JSON-LD support for Wikidata
Closed, ResolvedPublic

Description

Provide JSON-LD support for Wikidata

Details

Related Gerrit Patches:

Event Timeline

Lea_Lacroix_WMDE triaged this task as Normal priority.Oct 16 2018, 12:59 PM
Lea_Lacroix_WMDE created this task.
Restricted Application removed a project: Patch-For-Review. · View Herald TranscriptOct 16 2018, 12:59 PM

JsonLD is now enabled on beta, see announcement

Addshore moved this task from incoming to in progress on the Wikidata board.Oct 16 2018, 1:50 PM
Tpt added a comment.EditedOct 16 2018, 2:39 PM

Not a problem but a cosmetic proposal: Instead of having the structure:

{
    "@graph": [
        {
            "@id": "wdata:Q64",
            "@type": "schema:Dataset",
            "about": "wd:Q64",
            ...a
        },
        {
            "@id": "wd:Q64",
            "@type": "wikibase:Item",
            ...b
        }
    ],
    "@context": ...c
}

it would be nice to output:

{
    "@id": "wdata:Q64",
    "@type": "schema:Dataset",
    "about": {
            "@id": "wd:Q64",
            "@type": "wikibase:Item",
            ...b
    },
    ...a
    "@context": ...c
}

The two structures encodes exactly the same RDF graph but the new one would allow to get the item by using the .about path instead of .@graph[1] path that is more fragile.

@Tpt
If I understand you correctly, structure you propose would fail if there is more than one entity, though. Which is typically indeed the case, since we return stub representations of referenced entities along with the data about the requested entity.

Tpt added a comment.Oct 16 2018, 3:55 PM

@daniel We could do something similar for stubs. with structures like:

{
....
"property": {
    "@id": "wd:Q64",
    "label": { "@value": "Foo", "@language": "en"}
...
}

instead of

{
"@graph": [
  {
    ...
    "property": "wd:Q64",
    ...
  },
  {
     "@id": "wd:Q64",
     "label": { "@value": "Foo", "@language": "en"}
  }
 ]
}

This structure may make the client processing more convenient and the structure look more like a "usual" JSON REST API.

But my proposed structure is indeed forbidding to return the description of unrelated entities.

We only emit the "@graph" form in purtle if the API is used to annotate more than a single entity: https://github.com/wikimedia/purtle/blob/master/src/JsonLdRdfWriter.php#L37

So you're not really asking for a change in the JSON-LD, you're asking for wikidata to only emit a single entity on the /entity/* endpoint -- and that would/could apply to all the different representations using the purtle backend. That's not a JSON-LD specific request.

Tpt added a comment.Oct 16 2018, 8:26 PM

So you're not really asking for a change in the JSON-LD, you're asking for wikidata to only emit a single entity on the /entity/* endpoint -- and that would/could apply to all the different representations using the purtle backend. That's not a JSON-LD specific request.

Not really, in my proposal there is still the same number of RDF resources described (in my first example at least wdata:Q64 and wd:Q64). I proposing a new Purtle feature that would allow to make use of JSON-LD embedding feature [1] and probably also of the nested rdf:Description tags of RDF/XML and the short blank node syntax of Turtle [2]. But it's indeed probably out of scope of this task. I'm going to open a new task instead.

[1] https://www.w3.org/TR/json-ld/#embedding
[2] https://www.w3.org/TR/turtle/#unlabeled-bnodes

@Tpt embedding will be hard to do due to the streaming nature of purtle. But let's discuss that when you file the new ticket.

according to mailing list (Wikidata Digest, Vol 83, Issue 18), this now enabled on beta. Yet when one requests the link: https://wikidata.beta.wmflabs.org/wiki/Special:EntityData/Q64.jsonld, it does not work?

Christopher added a comment.EditedOct 24 2018, 12:39 AM

thanks, I look forward to this being deployed. json-ld will be very useful for wikidata, particularly framing. You might want to consider providing the context as a remote link to reduce the payloads (and "noise" in the data). Here is that test entity, framed on the playground. Notice how it merges the statements and references.
jsonld-playground framed wd item

Why these two? I don't see any way that this will have any subtasks.

abian awarded a token.Nov 22 2018, 2:41 PM
abian added a subscriber: abian.Nov 22 2018, 3:52 PM

What is preventing this from being enabled on production?

Addshore added a subscriber: Addshore.

I guess that is up to @Lydia_Pintscher :)

Addshore changed the task status from Open to Stalled.Jun 24 2019, 10:43 PM

From my side we're ready to go to production as soon as @Tpt says his concerns further up have been addressed. @Tpt Care to chime in?

Tpt added a comment.Jun 27 2019, 6:21 PM

@Lydia_Pintscher Improving JSON-LD structure like I proposed requires a strong refactoring of Purtle. I would not commit to do it anytime soon and I believe no one else would. So, I don't think it's a good idea to block JSON-LD deployment for that.

Lydia_Pintscher changed the task status from Stalled to Open.Jun 27 2019, 6:33 PM

Ok cool! Then let's go ahead.

@Lydia_Pintscher Improving JSON-LD structure like I proposed requires a strong refactoring of Purtle. I would not commit to do it anytime soon and I believe no one else would. So, I don't think it's a good idea to block JSON-LD deployment for that.

If you look into refactoring Purtle at some point, note that many of its quirks are there because it has been hand optimized for speed and memory usage.

WMDE-leszek added a comment.EditedJun 28 2019, 6:57 AM

Ok cool! Then let's go ahead.

For the sake of the development good practices I'd insist that the same functionality is brought back to beta instance prior to enabling it on wikidata.org. That is that T226472 is fixed first.

Yeah sounds good.

Do you foresee any changes to the context/vocabulary/ontology in the future (e.g. implementing processing features of JSON-LD 1.1)? How will context changes be versioned / published?

Could not also the ontology be dereferenceable as a json-ld context? Then you could use @vocab to provide a default for the wikibase properties and types. (e.g. "@vocab": "http://wikiba.se/ontology-1.0.jsonld")

Lea_Lacroix_WMDE added a comment.EditedJul 4 2019, 11:33 AM

This should be deployed on July 8th. I'll take care of the announcement.

Change 521221 had a related patch set uploaded (by Ladsgroup; owner: Ladsgroup):
[operations/mediawiki-config@master] Enable jsonld output format for wikibase entities everywhere

https://gerrit.wikimedia.org/r/521221

Change 521221 merged by jenkins-bot:
[operations/mediawiki-config@master] Enable jsonld output format for wikibase entities everywhere

https://gerrit.wikimedia.org/r/521221

Mentioned in SAL (#wikimedia-operations) [2019-07-08T11:16:53Z] <ladsgroup@deploy1001> Synchronized wmf-config/InitialiseSettings.php: [[gerrit:521221|Enable jsonld output format for wikibase entities everywhere (T207168)]] (duration: 00m 49s)

daniel removed a subscriber: daniel.Jul 8 2019, 2:15 PM

Something is amiss with these...not found.

"wikibase": "http://wikiba.se/ontology#",
        "statements": {
            "@id": "wikibase:statements"
        },
        "identifiers": {
            "@id": "wikibase:identifiers"
        },
        "sitelinks": {
            "@id": "wikibase:sitelinks"
        },

Something is amiss with these...not found.

Those are URIs not URLs, they don't need to be found... or am I missing something?

@dbarratt in the Wikibase ontology I could not find those properties in the OWL document returned. Sorry, I'm getting caught up with your schema layouts as fast as I can :-) I expected my parser to retrieve information about their description, range, domain. I do see the class "Statement" however.