add CORS to all redirecs in chain from https://www.wikidata.org/entity/{Q...}
Open, NormalPublic

Description

I work on developing few new Solid web applications which run in a web browser and get data via CORS. They use linked data and I would like to use wikidata as common reference in similar way as people often use DBpedia.

Taking as example http://www.wikidata.org/entity/Q1141085

fetch('http://www.wikidata.org/entity/Q1141085')

Chrome 54

Fetch API cannot load https://www.wikidata.org/entity/Q1141085. Redirect from 'https://www.wikidata.org/entity/Q1141085' to 'https://www.wikidata.org/wiki/Special:EntityData/Q1141085' has been blocked by CORS policy: No 'Access-Control-Allow-Origin' header is present on the requested resource. Origin 'null' is therefore not allowed access. If an opaque response serves your needs, set the request's mode to 'no-cors' to fetch the resource with CORS disabled.

Firefox 49

Cross-Origin Request Blocked: The Same Origin Policy disallows reading the remote resource at https://www.wikidata.org/entity/Q1141085. (Reason: CORS header ‘Access-Control-Allow-Origin’ missing).

checking from CLI with curl

curl -IL http://www.wikidata.org/entity/Q1141085 -H "Accept: text/turtle"

HTTP/1.1 301 Moved Permanently
Location: https://www.wikidata.org/entity/Q1141085

HTTP/1.1 303 See Other
Location: https://www.wikidata.org/wiki/Special:EntityData/Q1141085

HTTP/1.1 303 See Other
Location: https://www.wikidata.org/wiki/Special:EntityData/Q1141085.ttl

HTTP/1.1 200 OK
Content-Type: text/turtle; charset=UTF-8
Access-Control-Allow-Origin: *

So while the final 200 OK response includes CORS header Access-Control-Allow-Origin: *, in browser client which starts from the URI denoting the entity: http://www.wikidata.org/entity/Q1141085 can NOT follow all the redirects to arrive to that response

Daniel Kinzler suggested me to report this issue here
https://twitter.com/brightbyte/status/795994606282428416

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptNov 8 2016, 8:53 PM
hoo added a comment.Nov 9 2016, 3:46 PM

These redirect headers are set via Apache configuration in modules/mediawiki/files/apache/sites/wikidata-uris.incl in https://gerrit.wikimedia.org/r/operations/puppet. I don't think Apache has a nice way to set additional headers in these cases (only).

This issue looks also related to https://phabricator.wikimedia.org/T119536 which I understand proposes removing the second 303 which I +1

Where do you set CORS header for https://www.wikidata.org/wiki/Special:EntityData/Q1141085.ttl or https://www.wikidata.org/wiki/Special:EntityData/Q1141085.json ? Those responses have Access-Control-Allow-Origin: *

Change 320691 had a related patch set uploaded (by Hoo man):
Always set "Access-Control-Allow-Origin: *" in EntityDataRequestHandler

https://gerrit.wikimedia.org/r/320691

hoo added a comment.Nov 9 2016, 10:03 PM

I've just added a patch for always setting the CORS header for Special:EntityData, but that will only solve part of the problem here.

A potential solution I could think of would be to ProxyPass the /entity/ (and related) redirects to the special page. That would also solve us T119536 for free, as far as I can tell.

A potential solution I could think of would be to ProxyPass the /entity/ (and related) redirects to the special page. That would also solve us T119536 for free, as far as I can tell.

brilliant!

so we should get

curl -IL http://www.wikidata.org/entity/Q1141085 -H "Accept: text/turtle"

HTTP/1.1 303 See Other
Location: https://www.wikidata.org/wiki/Special:EntityData/Q1141085.ttl
Access-Control-Allow-Origin: *

HTTP/1.1 200 OK
Content-Type: text/turtle; charset=UTF-8
Access-Control-Allow-Origin: *

thiemowmde triaged this task as Normal priority.Nov 10 2016, 8:49 AM
thiemowmde assigned this task to hoo.
thiemowmde moved this task from incoming to in current sprint on the Wikidata board.
thiemowmde added a subscriber: thiemowmde.
daniel added a comment.EditedNov 10 2016, 2:54 PM

A potential solution I could think of would be to ProxyPass the /entity/ (and related) redirects to the special page.

We can proxy if we are sure the request is going to trigger a redirect, e.g. because of content negotiation. So /entity/Q1234 can be proxied, since /wiki/Special:EntityData/Q1234 is going to trigger a redirect. But /entity/Q1234.ttl must NOT be proxied, since /wiki/Special:EntityData/Q1234.ttl is going to return data straight away.

So /entity/Q1234.ttl must remain a redirect, or we could prohibit it. We currently allow it as a shorthand, but it's not documented, and it's semantically meaningless (there is no Turtle-representation of the concept - just of the document).

Change 320691 merged by jenkins-bot:
Always set "Access-Control-Allow-Origin: *" in EntityDataRequestHandler

https://gerrit.wikimedia.org/r/320691

elf-pavlik added a comment.EditedNov 10 2016, 3:11 PM

But that would be semantically dirty. As far as I understand, it's best practice to not serve data directly when resolving a concept URI, but to redir3ect to a document URL first.

I understood that it would work this way

URI of a thing: http://www.wikidata.org/entity/Q1141085 after first 301 to HTTPS
would HTTP 303 redirect to the document with the representation in requested content type
https://www.wikidata.org/wiki/Special:EntityData/Q1141085.ttl

Currently as explained in T119536 you have two 303 redirects. If you proxy pass the URI of a thing (instead of the first 303 redirect) to Special:EntityData page which seems to handle content negotiation. It should still 303 redirect you (currently the second 303 redirect) to the document with requested content type, so should end up with this case: https://www.w3.org/TR/cooluris/#r303uri

daniel added a comment.EditedNov 10 2016, 3:12 PM
That would also solve us T119536 for free, as far as I can tell.

It's pretty much exactly what T119536 is proposing, so I'm not sure about "free", but yea :) We can pick rewrite vs proxy, but I see no difference in practice.

hoo added a comment.Nov 10 2016, 11:35 PM

A potential solution I could think of would be to ProxyPass the /entity/ (and related) redirects to the special page.

We can proxy if we are sure the request is going to trigger a redirect, e.g. because of content negotiation. So /entity/Q1234 can be proxied, since /wiki/Special:EntityData/Q1234 is going to trigger a redirect. But /entity/Q1234.ttl must NOT be proxied, since /wiki/Special:EntityData/Q1234.ttl is going to return data straight away.

So /entity/Q1234.ttl must remain a redirect, or we could prohibit it. We currently allow it as a shorthand, but it's not documented, and it's semantically meaningless (there is no Turtle-representation of the concept - just of the document).

Indeed, we could answer queries for /entity/Q1234 directly (ProxyPass), but still issue a 301 to Special:EntityData in case there's a (pseudo) file extension attached.

Thiemo pointed out another complication: our canonical URIs are HTTP, not HTTPS. There is a redirect from the HTTP URL to the HTTPS URL.

The HTTP-HTTPs-redirect woould need to have the Access-Control headers set. We don't want to proxy the content here, because we want to client to receive the data over a secure connection. HTTP is really a deprecated interface.

Perhaps the suggested force-redirect parameter can fix this, too. We can just treat HTTP and HTTPS access to the /entity/ path the same, without a redirect from one to the other. In both cases, we rewrite or proxy, and set the force-redirect param. This way, the /entity/ path will always return a redirect (or error), and always with the correct Access-Control headers.

Jonas moved this task from Review to Done on the Wikidata-Sprint board.Feb 6 2017, 11:23 AM
thiemowmde removed hoo as the assignee of this task.

It appears this got stuck in the review column since December, but there was nothing to review. Moving it back to where it came from.