[RFC] should wikidata.org/entity/Q12345 do content negotiation, instead of redirecting to wikidata.org/wiki/Special:EntityData/Q36661 first?
Open, NormalPublic

Description

Currently, wikidata.org/entity/Q12345 triggers a 303 redirect to wikidata.org/wiki/Special:EntityData/Q12345, which then applies content negotiation and then triggers another 303 redirect to wikidata.org/wiki/Special:EntityData/Q12345.xyz (or to wikidata.org/wiki/Q12345 if appropriate).

If wikidata.org/entity/Q12345 would use an internal rewrite (or proxy) to request wikidata.org/wiki/Special:EntityData/Q12345, only one 303 redirect would be sent to the client. That would better match what is proposed in http://www.w3.org/TR/cooluris/#r303gendocument.

Current sequence (with T119532 fixed):

	> wget -S https://www.wikidata.org/entity/Q36661 2>&1 | egrep '^ *HTTP|^ *Location' 
	HTTP-Anforderung gesendet, warte auf Antwort... 
		HTTP/1.1 303 See Other
		Location: https://www.wikidata.org/wiki/Special:EntityData/Q36661
	HTTP-Anforderung gesendet, warte auf Antwort... 
		HTTP/1.1 303 See Other
		Location: https://www.wikidata.org/wiki/Special:EntityData/Q36661.json
	HTTP-Anforderung gesendet, warte auf Antwort... 
		HTTP/1.1 200 OK

Proposed sequence:

	> wget -S https://www.wikidata.org/entity/Q36661 2>&1 | egrep '^ *HTTP|^ *Location' 
	HTTP-Anforderung gesendet, warte auf Antwort... 
		HTTP/1.1 303 See Other
		Location: https://www.wikidata.org/wiki/Special:EntityData/Q36661.json
	HTTP-Anforderung gesendet, warte auf Antwort... 
		HTTP/1.1 200 OK
NOTE: we can only rewrite (or proxy) from the /entity/Q... path if we are sure that the target URL sill still trigger a redirect (or error). We should never return content from the /entity/ path, to avoid cache pollution and confusion about subject URI and document URL. So, /entity/Q1234.ttl must not be rewritten but redirected to /wiki/Special:EntityData/Q1234.ttl, since that will return data. To ensure this, a magic parameter could be passed, something like force-redirect=true, which would force Special:EntityData to send a redirect to the canonical URL instead of serving content directly.
daniel created this task.Nov 24 2015, 6:02 PM
daniel updated the task description. (Show Details)
daniel raised the priority of this task from to Needs Triage.
daniel added a project: Wikidata.
daniel added subscribers: daniel, JanZerebecki.
Restricted Application added subscribers: StudiesWorld, Aklapper. · View Herald TranscriptNov 24 2015, 6:02 PM
daniel renamed this task from [RFC] should wikidata.org/entity/Q12345 do content negotiation, or redirect to wikidata.org/wiki/Special:EntityData/Q36661 first? to [RFC] should wikidata.org/entity/Q12345 do content negotiation, instead of redirecting to wikidata.org/wiki/Special:EntityData/Q36661 first?.
daniel set Security to None.
Lydia_Pintscher triaged this task as Normal priority.Dec 18 2015, 10:25 AM

+1 proposal of removing second 303, which seems matching https://www.w3.org/TR/cooluris/#r303uri

hoo added a subscriber: hoo.Nov 9 2016, 10:01 PM

@elf-pavlik this really proposes to remove the first 303, and keep the second. But I guess that's what you mean.

Correct, +1 removing redirect to the 'in-between' page and after 301 to HTTPS doing single 303 redirect directly to content negotiated representation where in your case URI ends with .ttl, .json etc.

daniel updated the task description. (Show Details)Nov 10 2016, 3:30 PM

So, /entity/Q1234.ttl must not be rewritten but redirected to /wiki/Special:EntityData/Q1234.ttl, since that will return data. To ensure this, a magic parameter could be passed, something like force-redirect=true, which would force Special:EntityData to send a redirect to the canonical URL instead of serving content directly.

/entity/Q1234.ttl currently does NOT appear in chain of redirects from /entity/Q1234 and it doesn't seem to serve any purpose to have URIs like /entity/Q1234.ttl appearing anywhere all together.

Currently /entity/Q1234 redirects to /wiki/Special:EntityData/Q1234 which seems to handle content negotiation and redirects to /wiki/Special:EntityData/Q1234.ttl or /wiki/Special:EntityData/Q1234.json.

Preferably /entity/Q1234 will directly do 303 redirect to /wiki/Special:EntityData/Q1234.ttl or /wiki/Special:EntityData/Q1234.json. Which should work if you proxy pass to /wiki/Special:EntityData/Q1234 which seems to handle content negotiation and do 303 redirect (plus should have CORS headers set)

Once again, /entity/Q1234.ttl or /entity/Q1234.json don't seem to have any purpose and such URIs should never appear anywhrere.

Once again, /entity/Q1234.ttl or /entity/Q1234.json don't seem to have any purpose and such URIs should never appear anywhrere.

I agree that /entity/Q1234.xyz is semantically unclear, and we don't need it. We have supported it as an undocumented shorthand for years now, though. We can remove support for this, but we should not do without consideration.

In any case, we should have a mechanism that makes *sure* we never return content from an /entity/ path. That's why I proposed the force-redirect parameter. Even if we disallow Q1234.xyz, there is nothing that guarantees that Special:EntityData/Q1234 will always cause a redirect (or error). We need to add that guarantee.

@hoo Can you point me to the config file that defines these rewrites/redirects? I keep forgetting where to look for that kind of stuff.

hoo added a comment.Nov 16 2016, 10:25 PM

@hoo Can you point me to the config file that defines these rewrites/redirects? I keep forgetting where to look for that kind of stuff.

See T150290#2782642. I plan to pick this up myself eventually, but I can't tell when.

daniel moved this task from Inbox to Push on the User-Daniel board.Jan 5 2017, 7:02 PM

I'm all for the proposed change.

Please correct me if I'm wrong, but this is what I understand:

  • https://www.wikidata.org/entity/Q36661 is what we call a canonical concept URI. This is supposed to be called without a file extension, and supposed to do content negotiation.
  • Special:EntityData is supposed to be called with a file extension. We want this to be cacheable. The fact that the same special page also does content negotiation is a technical detail, because the code doing this must be somewhere.

This means https://www.wikidata.org/wiki/Special:EntityData/Q36661 is a technical detail that could as well be hidden.

What am I missing? What could be the worst negative effect this change may have?

Esc3300 added a subscriber: Esc3300.Mar 9 2017, 7:41 AM

Isn't there another step involved, depending on how people call it?

http --> https

daniel added a comment.Mar 9 2017, 2:33 PM

Please correct me if I'm wrong, but this is what I understand:

Your are correct, though we should also consider convenience.

The canonical URI does not have a file extension. But being able to call /entity/Q64.json is very convenient. I'd like to keep support for it if that can be done without too much trouble.

  • Special:EntityData is supposed to be called with a file extension. We want this to be cacheable.

Yes. E.g. https://www.wikidata.org/wiki/Special:EntityData/Q36661.json is the canonical URI (and URL) of the JSON document describing Q36661.

The fact that the same special page also does content negotiation is a technical detail, because the code doing this must be somewhere.

Indeed.

This means https://www.wikidata.org/wiki/Special:EntityData/Q36661 is a technical detail that could as well be hidden.

Yes, indeed. But the URL will be used internally (that's what /entity/Q36661 gets rewritten to) to trigger content negotiation. Preventing external access while allowing internal access may not be trivial, and cause confusion. Supporting this URL externally may also be useful for testing.

What am I missing? What could be the worst negative effect this change may have?

I don't see any. It just needs doing.

Change 357985 had a related patch set uploaded (by Ladsgroup; owner: Amir Sarabadani):
[operations/puppet@production] Make /entity/ redirect internal

https://gerrit.wikimedia.org/r/357985

@Ladsgroup please note the NOTE section

@daniel: Done. I overlooked it (it's big but still...)

Ladsgroup added a project: Wikidata-Sprint.
Restricted Application added a project: User-Ladsgroup. · View Herald TranscriptMon, Jun 12, 10:20 AM
Ladsgroup moved this task from Proposed to Review on the Wikidata-Sprint board.Mon, Jun 12, 10:20 AM

The current patch defines two rewrite rules:

RewriteRule ^/entity/([^.]*)$ %{ENV:RW_PROTO}://%{SERVER_NAME}/wiki/Special:EntityData/$1 [QSA]
RewriteRule ^/entity/(.*\..*)$ %{ENV:RW_PROTO}://%{SERVER_NAME}/wiki/Special:EntityData/$1 [R=303,QSA]

Requests of the form /entity/Q1234 will trigger an internal rewrite, and Special:EntityData will then trigger a 303 redirect.
Requests of the form /entity/Q1234.json will trigger a 303 redirect, and Special:EntityData will directly serve data.

This should work OK. Now we just need someone from ops to check it and merge it.

Eventually, I'd like to see a solution that uses a single rule, and adds redirect=force to the parameters for Special:EntityData. Support for this will however need to be implemented in Special:EntityData first.

Change 358631 had a related patch set uploaded (by Filippo Giunchedi; owner: Filippo Giunchedi):
[operations/puppet@production] mediawiki: match beta wikidata with production includes

https://gerrit.wikimedia.org/r/358631

Change 358631 merged by Filippo Giunchedi:
[operations/puppet@production] mediawiki: match beta wikidata with production includes

https://gerrit.wikimedia.org/r/358631

Change 359004 had a related patch set uploaded (by Daniel Kinzler; owner: Daniel Kinzler):
[mediawiki/extensions/Wikibase@master] Add redirect=force option to Special:EntitData

https://gerrit.wikimedia.org/r/359004

Ladsgroup removed Ladsgroup as the assignee of this task.Thu, Jun 15, 9:00 AM
Ladsgroup removed a project: User-Ladsgroup.

I ran out of ideas how to fix it.

Ladsgroup moved this task from Doing to Backlog on the Wikidata-Sprint board.Thu, Jun 15, 9:00 AM

Change 359004 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] Add redirect=force option to Special:EntityData

https://gerrit.wikimedia.org/r/359004

Once I13373c8859be885 is live, we should ask ops to look at https://gerrit.wikimedia.org/r/#/c/357985/ again.