Page MenuHomePhabricator

LDF service does not Vary responses by Accept, sending incorrect cached responses to clients
Open, NormalPublicBUG REPORT

Description

The WDQS LDF service can return data in different formats for the same URL, depending on the client’s Accept header. However, its responses are cacheable and the Vary response header doesn’t include the Accept request header, so responses for different formats may be mixed up.


Original description:

Steps to Reproduce:
Make a request to https://query.wikidata.org/bigdata/ldf?subject=&predicate=http%3A%2F%2Fwww.wikidata.org%2Fprop%2Fdirect%2FP3417&page=2 with headers {"Accept": "application/ld+json"}

Actual Results:
Html is returned

Expected Results:
JSON returned

Note:
https://query.wikidata.org/bigdata/ldf?subject=&predicate=http://www.wikidata.org/prop/direct/P3417 and Make a request to https://query.wikidata.org/bigdata/ldf?subject=&predicate=http%3A%2F%2Fwww.wikidata.org%2Fprop%2Fdirect%2FP3417&page=3 work correctly.

Thanks!

Details

Related Gerrit Patches:

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptSep 4 2019, 4:27 PM
Restricted Application added a project: Wikidata. · View Herald TranscriptSep 4 2019, 4:50 PM

Can’t reproduce:

$ curl -I -H 'Accept: application/ld+json' 'https://query.wikidata.org/bigdata/ldf?subject=&predicate=http%3A%2F%2Fwww.wikidata.org%2Fprop%2Fdirect%2FP3417&page=2'
HTTP/2 200 
date: Wed, 04 Sep 2019 16:55:06 GMT
content-type: application/ld+json;charset=utf-8
server: nginx/1.13.6
x-served-by: wdqs1005
access-control-allow-origin: *
cache-control: public, max-age=300
vary: Accept, Accept-Encoding
x-varnish: 40076957, 886568030 892394285, 888209857
accept-ranges: bytes
age: 0
x-cache: cp1085 pass, cp3032 hit/1, cp3041 pass
x-cache-status: hit-local
server-timing: cache;desc="hit-local"
strict-transport-security: max-age=106384710; includeSubDomains; preload
set-cookie: WMF-Last-Access=04-Sep-2019;Path=/;HttpOnly;secure;Expires=Sun, 06 Oct 2019 12:00:00 GMT
set-cookie: WMF-Last-Access-Global=04-Sep-2019;Path=/;Domain=.wikidata.org;HttpOnly;secure;Expires=Sun, 06 Oct 2019 12:00:00 GMT
x-analytics: https=1;nocookies=1
x-client-ip: 90.187.22.233

This might be a caching problem – the Vary response header doesn’t include the Content-Type, so if you previously requested HTML for the same URL you might now get that cached response even when asking for JSON.

Is there a way to call this so that I'm not hitting the cached version?

A POST request would probably bypass the cache, but the backend doesn’t seem to be happy with that (“400 Bad Request”). You can also try appending some random garbage to the URL (…&breakcache=asdfasdf).

that worked, thanks!

Lucas_Werkmeister_WMDE renamed this task from {"Accept": "application/ld+json"} Yields html results (for this url) to LDF service does not Vary responses by Content-Type, sending incorrect cached responses to clients.Sep 6 2019, 10:48 AM
Lucas_Werkmeister_WMDE updated the task description. (Show Details)
Restricted Application added a project: Operations. · View Herald TranscriptSep 6 2019, 10:49 AM
jbond triaged this task as Normal priority.Sep 9 2019, 9:37 AM

Change 536168 had a related patch set uploaded (by Gehel; owner: Gehel):
[wikidata/query/rdf@master] Add a Vary: Content-Type header to LDF responses.

https://gerrit.wikimedia.org/r/536168

Lucas_Werkmeister_WMDE renamed this task from LDF service does not Vary responses by Content-Type, sending incorrect cached responses to clients to LDF service does not Vary responses by Accept, sending incorrect cached responses to clients.Sep 12 2019, 1:42 PM
Lucas_Werkmeister_WMDE updated the task description. (Show Details)
BBlack added a subscriber: BBlack.Sep 19 2019, 4:09 AM

We'll also need to normalize the incoming Accept headers up in the edge cache layer to avoid pointless vary explosions. Ideally the normalization should exactly match the application-layer logic that chooses the output content type. Do you have some pseudo-code (or real code link is fine too) description of how accept is parsed to select content-types?

Real code: MIMEParse.java makes the decision, is initialized with five MIME types, and then asked for the “best match” for the Accept header. It appears to take the “quality” parameter into account, somehow; on the other hand, others have already complained that it doesn’t understand application/ld+json as equivalent to application/json.

Gehel added a subscriber: Gehel.Oct 1 2019, 8:05 AM

The MIME types used are:

  • text/html
  • application/rdf+xml
  • application/n-triples
  • application/ld+json
  • text/turtle
ema moved this task from Triage to Caching on the Traffic board.Mon, Oct 14, 6:29 PM