Page MenuHomePhabricator

Include links in Wikidata HTTP responses to different entity representations as Link headers
Closed, ResolvedPublic

Description

We should include HTTP Link headers (Link: <URI>; rel=“describedby", Link: <URI>; rel=“alternate", etc.) in Wikidata HTTP responses for a given entity to point out all its possible different representations plus useful metadata.

These headers let data users and their tools make dacisions without having to load and process the entire resources. They are important if we want the data to be seriously used outside the Wikimedia projects.

Example: DBpedia

$ curl --head http://dbpedia.org/page/Concept | grep Link:

DBpedia includes, in HTTP Link headers:

  • <http://creativecommons.org/licenses/by-sa/3.0/>; rel="license"
  • <http://dbpedia.org/data/Concept.rdf>; rel="alternate"; type="application/rdf+xml"; title="Structured Descriptor Document (RDF/XML format)"
  • <http://dbpedia.org/data/Concept.n3>; rel="alternate"; type="text/n3"; title="Structured Descriptor Document (N3/Turtle format)"
  • <http://dbpedia.org/data/Concept.json>; rel="alternate"; type="application/json"; title="Structured Descriptor Document (RDF/JSON format)"
  • <http://dbpedia.org/data/Concept.atom>; rel="alternate"; type="application/atom+xml"; title="OData (Atom+Feed format)"
  • <http://dbpedia.org/sparql?default-graph-uri=http%3A%2F%2Fdbpedia.org&query=DESCRIBE%20%3Chttp%3A%2F%2Fdbpedia.org%2Fresource%2FConcept%3E&format=text%2Fcsv>; rel="alternate"; type="text/csv"; title="Structured Descriptor Document (CSV format)"
  • <http://dbpedia.org/sparql?default-graph-uri=http%3A%2F%2Fdbpedia.org&query=DESCRIBE%20%3Chttp%3A%2F%2Fdbpedia.org%2Fresource%2FConcept%3E&format=text%2Fcxml>; rel="alternate"; type="text/cxml"; title="Structured Descriptor Document (CXML format)"
  • <http://dbpedia.org/data/Concept.ntriples>; rel="alternate"; type="text/plain"; title="Structured Descriptor Document (N-Triples format)"
  • <http://dbpedia.org/sparql?default-graph-uri=http%3A%2F%2Fdbpedia.org&query=DESCRIBE%20%3Chttp%3A%2F%2Fdbpedia.org%2Fresource%2FConcept%3E&format=application%2Fmicrodata%2Bjson>; rel="alternate"; type="application/microdata+json"; title="Structured Descriptor Document (Microdata/JSON format)"
  • <http://dbpedia.org/sparql?default-graph-uri=http%3A%2F%2Fdbpedia.org&query=DESCRIBE%20%3Chttp%3A%2F%2Fdbpedia.org%2Fresource%2FConcept%3E&format=text%2Fhtml>; rel="alternate"; type="text/html"; title="Structured Descriptor Document (Microdata/HTML format)"
  • <http://dbpedia.org/sparql?default-graph-uri=http%3A%2F%2Fdbpedia.org&query=DESCRIBE%20%3Chttp%3A%2F%2Fdbpedia.org%2Fresource%2FConcept%3E&format=application%2Fld%2Bjson>; rel="alternate"; type="application/ld+json"; title="Structured Descriptor Document (JSON-LD format)"
  • <http://dbpedia.org/resource/Concept>; rel="http://xmlns.com/foaf/0.1/primaryTopic"
  • <http://dbpedia.org/resource/Concept>; rev="describedby"
  • <http://mementoarchive.lanl.gov/dbpedia/timegate/http://dbpedia.org/page/Concept>; rel="timegate"

Example: BNE

$ curl --head http://datos.bne.es/persona/XX1718747.html | grep Link:

The National Library of Spain (Biblioteca Nacional de España) includes, in HTTP Link headers:

  • <http://datos.bne.es/persona/XX1718747.rdf> ; rel="alternate"; type="application/rdf+xml"; title="Structured Descriptor Document (RDF/XML format)"
  • <http://datos.bne.es/persona/XX1718747.ttl> ; rel="alternate"; type="text/turtle"; title="Structured Descriptor Document (Turtle format)"

Useful links

Event Timeline

abian created this task.May 28 2017, 2:10 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMay 28 2017, 2:10 PM
Lydia_Pintscher triaged this task as Low priority.Jun 11 2017, 5:18 PM
Lydia_Pintscher added a subscriber: daniel.
abian added a comment.May 1 2018, 6:03 PM

Do you see this feasible/convenient, @daniel?

abian renamed this task from Include links in Wikidata HTTP responses to different entity descriptions as HTTP Link headers to Include links in Wikidata HTTP responses to different entity representations as Link headers.May 1 2018, 6:44 PM
abian added a project: SEO.Sep 17 2019, 1:04 PM

Any thoughts on this?

@Addshore Do you see any reason not to do this?

I see no reason not to do this.

Let's do it then :)
@abian Do you want to propose a patch?

abian added a comment.Sep 27 2019, 9:10 PM

Okay! But I'd need some help to know what/where to touch. To set HTTP headers for the Special:EntityData pages I would probably change /repo/includes/LinkedData/EntityDataRequestHandler.php, but what should I touch if I want to set HTTP headers for the wiki pages (not necessarily loading them through Special:EntityData)?

I would go with the place that we put og:description meta tags but I'm not sure if it's possible to add response headers there (It's in ViewEntityAction.php)

ViewEntityAction has access to OutputPage via $this->getOutput() which has addLinkHeader which is what you need afaik.

Change 547739 had a related patch set uploaded (by Abián; owner: Abián):
[mediawiki/extensions/Wikibase@master] Include Link headers in HTTP responses

https://gerrit.wikimedia.org/r/547739

What is this waiting for/stalled on?

What is this waiting for/stalled on?

The patch needs to address some code review concerns :)

abian added a comment.Jun 7 2020, 10:35 AM

I'm sorry I put this aside (my fault). I didn't include tests, but I feel I don't know the architecture well enough to write them; I sense that these will be complex or that I'll need to touch code elsewhere. I think it's better for me to officially give up on these tests so that a developer on the team (or other volunteer) can code them. I'll list a few checks I have in mind, but please don't consider this list essential or exhaustive:

  • A request for a page that is not a Wikibase entity does not generate a response with links to alternative structured representations.
  • A request for a Wikibase Item generates a response with links to alternative structured representations.
  • A request for a Wikibase Property generates a response with links to alternative structured representations.
  • The link headers are well formatted.
  • All the alternative structured representations listed are available.
  • All the alternative structured representations available are listed.
  • All the alternative structured representations listed are about the requested Wikibase entity.

@Lydia_Pintscher This is currently sitting in the waiting / stalled column of the campsite iteration board.
If we want to work on this ticket & patch then its probably best for it to jump off the board and be prioritized (as we would be doing more than code review on it now)

I added tests, this is ready for review.

Local output:

Link: <http://localhost/wiki1/index.php/Special:EntityData/Q50.json>; rel="alternate"; type="application/json",<http://localhost/wiki1/index.php/Special:EntityData/Q50.php>; rel="alternate"; type="application/vnd.php.serialized",<http://localhost/wiki1/index.php/Special:EntityData/Q50.rdf>; rel="alternate"; type="application/rdf+xml",<http://localhost/wiki1/index.php/Special:EntityData/Q50.n3>; rel="alternate"; type="text/n3",<http://localhost/wiki1/index.php/Special:EntityData/Q50.ttl>; rel="alternate"; type="text/turtle",<http://localhost/wiki1/index.php/Special:EntityData/Q50.nt>; rel="alternate"; type="application/n-triples",<http://localhost/wiki1/index.php/Special:EntityData/Q50.jsonld>; rel="alternate"; type="application/ld+json"

(746 bytes.) Note that this includes the “PHP” format, which I’d really rather remove altogether (T98035); I wonder if we’ll see an uptick in PHP-format requests from scrapers following these links?

Change 547739 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] Include Link headers in HTTP responses

https://gerrit.wikimedia.org/r/547739

Everything looks good to me on https://wikidata.beta.wmflabs.org/. Thank you, folks!

It's pending the opposite, that those formats point to each other and to the HTML version. This step is important because:

  • unlike the HTML version, where the link headers also appear in the content, with the alternative formats the links don't appear anywhere;
  • some agents give more credit to reciprocal links.

I don't know if I should try to modify repo/includes/LinkedData/EntityDataRequestHandler.php or if we'll finish earlier by leaving this to the experts from the beginning. What do you think?

abian updated the task description. (Show Details)Aug 28 2020, 1:18 PM
abian added a comment.Aug 28 2020, 1:52 PM

This is already working for the HTML pages of Wikidata.

  • A request for a page that is not a Wikibase entity does not generate a response with links to alternative structured representations.
$ curl --head 'https://www.wikidata.org/wiki/Wikidata:Main_Page' | grep 'link:' # existing page in the Wikidata namespace
¯\_(ツ)_/¯
$ curl --head 'https://www.wikidata.org/wiki/EntitySchema:E2' | grep 'link:' # existing page in the EntitySchema namespace
¯\_(ツ)_/¯
$ curl --head 'https://www.wikidata.org/wiki/Q7' | grep 'link:' # non-existing Item
¯\_(ツ)_/¯
  • A request for a Wikibase Item generates a response with links to alternative structured representations.
$ curl --head 'https://www.wikidata.org/wiki/Q5' | grep 'link:'
link: <https://www.wikidata.org/wiki/Special:EntityData/Q5.json>; rel="alternate"; type="application/json",<https://www.wikidata.org/wiki/Special:EntityData/Q5.php>; rel="alternate"; type="application/vnd.php.serialized",<https://www.wikidata.org/wiki/Special:EntityData/Q5.rdf>; rel="alternate"; type="application/rdf+xml",<https://www.wikidata.org/wiki/Special:EntityData/Q5.n3>; rel="alternate"; type="text/n3",<https://www.wikidata.org/wiki/Special:EntityData/Q5.ttl>; rel="alternate"; type="text/turtle",<https://www.wikidata.org/wiki/Special:EntityData/Q5.nt>; rel="alternate"; type="application/n-triples",<https://www.wikidata.org/wiki/Special:EntityData/Q5.jsonld>; rel="alternate"; type="application/ld+json"
  • A request for a Wikibase Property generates a response with links to alternative structured representations.
$ curl --head 'https://www.wikidata.org/wiki/Property:P39' | grep 'link:'
link: <https://www.wikidata.org/wiki/Special:EntityData/P39.json>; rel="alternate"; type="application/json",<https://www.wikidata.org/wiki/Special:EntityData/P39.php>; rel="alternate"; type="application/vnd.php.serialized",<https://www.wikidata.org/wiki/Special:EntityData/P39.rdf>; rel="alternate"; type="application/rdf+xml",<https://www.wikidata.org/wiki/Special:EntityData/P39.n3>; rel="alternate"; type="text/n3",<https://www.wikidata.org/wiki/Special:EntityData/P39.ttl>; rel="alternate"; type="text/turtle",<https://www.wikidata.org/wiki/Special:EntityData/P39.nt>; rel="alternate"; type="application/n-triples",<https://www.wikidata.org/wiki/Special:EntityData/P39.jsonld>; rel="alternate"; type="application/ld+json"
  • The link headers are well formatted.

They follow the pattern < + <url> + >; rel="alternate"; type=" + <type> + ".

  • All the alternative structured representations listed are available.

They are.

  • All the alternative structured representations available are listed.

They are. According to Special:EntityData: "json, php, n3, ttl, nt, rdf, jsonld".

  • All the alternative structured representations listed are about the requested Wikibase entity.

They are.

And it also seems to work on test.wikidata.org (and, by extension, any Wikibase installation from now on).

$ curl --head 'https://test.wikidata.org/wiki/Q10' | grep 'link:'
link: <https://test.wikidata.org/wiki/Special:EntityData/Q10.json>; rel="alternate"; type="application/json",<https://test.wikidata.org/wiki/Special:EntityData/Q10.php>; rel="alternate"; type="application/vnd.php.serialized",<https://test.wikidata.org/wiki/Special:EntityData/Q10.rdf>; rel="alternate"; type="application/rdf+xml",<https://test.wikidata.org/wiki/Special:EntityData/Q10.n3>; rel="alternate"; type="text/n3",<https://test.wikidata.org/wiki/Special:EntityData/Q10.ttl>; rel="alternate"; type="text/turtle",<https://test.wikidata.org/wiki/Special:EntityData/Q10.nt>; rel="alternate"; type="application/n-triples",<https://test.wikidata.org/wiki/Special:EntityData/Q10.jsonld>; rel="alternate"; type="application/ld+json"

This looks finished. @Lydia_Pintscher as you've "sponsored" this, I am letting you making a final call.

Lydia_Pintscher closed this task as Resolved.Oct 15 2020, 9:50 AM

Yay! :)
Let's handle additional changes in a different ticket then if you'd like @abian