Page MenuHomePhabricator

Check caching headers in ORES responses
Open, LowPublic

Description

Re: 304-vs-200, I was able to get some 304s, but only when I dropped the If-None-Match and relied on If-Modified-Since. It seems like the ETags might be inconsistent between serial requests to ores for the same resource? (the Last-Modified timestamps are too a bit).

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

What, exactly, is the problem? What's the proposed solution?

Halfak moved this task from Unsorted to Maintenance/cleanup on the Machine-Learning-Team board.

There are, very roughly, two mechanisms parties communicating via HTTP use for caching. The server can tell the client how long the reponse is valid, and then the client can just skip further requests for that time, and use a local copy of the data instead. That is maximally effective (there is no remote communication at all) but the server must know in advance that the data will not change. Alternatively, the client (which has a local copy of past data but no guarantee that it is still valid) can tell the server what version of the data it has, and the server can then either respond with "that is still good" (and skip processing and keep data transfer minimal) or send a more recent version of the data.

The latter is done with the Last-Modified/If-Modified-Since headers (the servers tags the data with the date when it last changed; the client sends that tag back in future requests; the server compares with the current last-modified date and sends an empty response when they match) or (preferably) with the ETags/If-None-Match headers (the server tags the data with some sort of unique version id, typically a content hash; the client sends what tag(s) it has in local cache; the server sends an empty response on match). When the client has up-to-date data, it should send a 304 Not Modified (with no body) instead of the usual 200 OK. Intermediate caches (like Varnish) also rely on these headers to optimize transfer volume.

It seems like the ORES web server (not the API, the part serving HTML and other assets) does not handle some of these headers correctly (see T137962#3026223) causing clients to re-download files that they already have. It would be nice to fix that (although it is a very minor issue - it only affects people looking at the ORES web UI). While we are there, it might be worth checking whether Wikilabels, and the API part of ORES, use caching headers correctly.

I guess the proposed solution is to check Google/Stackoverflow for Flask caching bugs and hope for the best :)