Page MenuHomePhabricator

MediaWiki should use ETags instead of Last-Modified and the Logged out cookie hack
Closed, ResolvedPublicFeature

Description

Right now MediaWiki supports browser caching of pages for both logged in and anonymous users through the Last-Modified header. MediaWiki uses a Loggedout cookie hack to prevent anons from being served a bad cache from when they were logged in. However there are a number of problems:

  • As a result of the logged out cookie, after you log out, for quite some time you are served 200 responses instead of proper 304's. In other words, even if you're capable of viewing a cache and your browser doesn't have a stale user cache in it you still won't get a cached page back. This means that logging in and then logging back out will make your wiki viewing potentially slower than being logged in.
  • As another result of the logged out cookie, even though you're an anon, you continue to bypass squid caches in most configurations and don't get the advantage of seeing the same efficiently cached pages as all the other anons.
  • And to top it off, the logged out cookie hack doesn't solve the issue in the other direction. You can view a page, get a Last-Modified header back, log in, go back to the page, and get a 304 that tells your browser to load it's cached page of you logged out instead of the proper one with your logged in interface.

The only way to fix all these troubles is to drop the Logged out cookie hack and instead use ETags which include info about the user so that when a user logs out, logs in, has their talkpage edited (because we want notifications to be sent), changes ip (if they're an anon and their ip address is being shown in the header), etc... the ETag's contents will change.

There's an interesting note about ETags. Browsers can send multiple ETags in their If-None-Match header. As a result a browser can actually have a logged out version and one (or maybe more) logged in versions of a page in their cache. If you view a page, log in, view it again, and log back out, when you go back to the page even though you viewed it with a different ETag and a different cached page, you could potentially end up getting a 304 because you still had a cached version for logged out users ;).


Version: unspecified
Severity: enhancement

Details

Reference
bz31639

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 11:54 PM
bzimport set Reference to bz31639.
bzimport added a subscriber: Unknown Object (MLST).

Sounds plausible... after a logout or session timeout (which the logout cookie doesn't currently handle), coming back will ask for the page with your last-seen logged-in etag... the caching proxies don't have it cached since it's set for local-only, so pass it on to the backend app servers. MediaWiki then checks the etag and sees that it doesn't match the user id now active in your session (if any), and kicks back an anonymous page, with the anonymous etag.

Might need to distinguish between 'session but not logged in' and 'no session and not logged in'.

(In reply to comment #1)

Might need to distinguish between 'session but not logged in' and 'no session
and not logged in'.

For anons we could include say the first 4 characters of an md5 of the session id in the ETag if there is a session.

Krinkle set Security to None.
Krinkle edited subscribers, added: ori; removed: Unknown Object (MLST).
Aklapper changed the subtype of this task from "Task" to "Feature Request".Feb 4 2022, 12:24 PM

The LoggedOut cookie has been removed in T142542.

Krinkle claimed this task.
Task description in the year 2011:

In other words, even if you're capable of viewing a cache and your browser doesn't have a stale user cache in it you still won't get a cached page back. This means that logging in and then logging back out will make your wiki viewing potentially slower than being logged in.

This was basically fixed as of 2016 (possibly earlier) per T142542, because logging out failed to actually set the LoggedOut cookie, but did clear most other session cookies such that you're probably back to the state one would be logged-out.

In any event, browser caches honor Vary: Cookie and so don't rely on MediaWiki's Last-Modified hack. The presence of session cookies, and then the absence of those session cookies, suffices to make sure your browser won't even consider the "wrong" cache. Thus you don't see logged-out caches while logged-in, won't see logged-in caches after logging-out, and thus naturally qualify again for the same pre-login caches after logging out (assuming no other cookie changes). That part resolves the the stated problem of this task.

This task suggested swiching from Last-Modified to E-Tags, but that would not improve this today, because the predominant factor is the cookies. If the URL + cookie headers (and any other Vary headers) aren't the same, then a cache isn't matchable under HTTP cache semantics, and thus isn't considered by the browser (no If-Modified-Since sent, and thus does not trigger any server-side logic in MediaWiki for a possible chance at a HTTP 304 response).

MediaWiki does still use Last-Modified to revalidate caches based on the revision timestamp (page edits), user touched timestamp (preferences, talk page notif), and CDN epoch (i.e. don't renew misc skin HTML and other global state beyond 14 days). These are all time-based today which seems simple enough and equally effective as an E-Tag and arguably easier to debug and less surprising.