Page MenuHomePhabricator

Core REST API marks responses as publicly cacheable while issuing set-cookie
Open, HighPublic

Description

See T264378#6516543 for example URIs

Core REST API endpoints set Cache-Control headers unconditionally, thus making responses publicly cacheable. However, it seems that for at least some cases MW also issues a Set-Cookie with a session cookie in the same response. Obviously, responses with a session cookie must not be cacheable in CDN.

We need to look what Action API does in this case and probably copy over the behavior. Most likely this will result in setting Cache-Control to private for logged-in users for MW REST API, since not only we can't really guarantee that Set-Cookie won't be issues, but we also can't properly guarantee we are not leaking any private data.

Additional consideration: we should probably implement some level of framework support for emitting Cache-Control headers and convert handlers to using it. Fixing individual handlers will be too error-prone in this case.

Event Timeline

daniel triaged this task as High priority.Oct 5 2020, 3:41 PM
daniel subscribed.

I see two things we could do:

  1. ensure that we only send Cache-Control if the response would be the same for any user. If user privileges are checked anywhere, directly or indirectly, me mustn't make the response cacheable. I note that this kind of thing could become much simpler and straight forward once we have Authority objects (T231930), since we could simply track privilege checks in the object, and later determine which privileges were used to create a response.
  2. Add a method for setting cache control headers to WebResponse, and use that instead of setting cache control headers directly. In that method, implement a cross-check, so an attempt to set a cache control header after setCookie() is called with log a warning and be ignored. Similarly, a call to setCookie() would log a warning, but it would remove the cache control head and still set the cookie.

From the Action API, https://gerrit.wikimedia.org/g/mediawiki/core/+/f43007d3f1cef7262f46dc4ef9ff753f4c5dbb94/includes/api/ApiMain.php#842 seems relevant.

Completely agreed on framework support for this - we don't want to have to think about this on a per-handler basis.

  1. ensure that we only send Cache-Control if the response would be the same for any user. If user privileges are checked anywhere, directly or indirectly, me mustn't make the response cacheable. I note that this kind of thing could become much simpler and straight forward once we have Authority objects (T231930), since we could simply track privilege checks in the object, and later determine which privileges were used to create a response.

Technically, this will make all responses uncacheable, since we are checking 'read' permission for all requests. The check should be if ( $checkedPermission && !PermissionManager::isEveryoneAllowed( $checkedPerm )

Action Api does the following to protect against this:

			if ( SessionManager::getGlobalSession()->isPersistent() ) {
				// Logged in or otherwise has session (e.g. anonymous users who have edited)
				// Mark request private
				$response->header( "Cache-Control: $privateCache" );

				return;
			}

I see two things we could do:

  1. ensure that we only send Cache-Control if the response would be the same for any user. If user privileges are checked anywhere, directly or indirectly, me mustn't make the response cacheable. I note that this kind of thing could become much simpler and straight forward once we have Authority objects (T231930), since we could simply track privilege checks in the object, and later determine which privileges were used to create a response.
  2. Add a method for setting cache control headers to WebResponse, and use that instead of setting cache control headers directly. In that method, implement a cross-check, so an attempt to set a cache control header after setCookie() is called with log a warning and be ignored. Similarly, a call to setCookie() would log a warning, but it would remove the cache control head and still set the cookie.

Would you object to mirroring the Action API's approach? From a quick review of the code, it appears that Action API:

  1. allows modules (handlers) to suggest a max-age value via ApiMain::setCacheMaxAge()
  2. inspects various things in ApiMain::sendCacheHeaders(), most relevantly SessionManager::getGlobalSession()->isPersistent(), and potentially override with private caching if necessary

It seems straightforward to add something similar to the REST API.

Longer term, I'm interested in using the Authority objects as you suggest. So I'm tempted to do my suggestion above for now and revisit this once the Authority change is in place.

Change 632356 had a related patch set uploaded (by BPirkle; owner: BPirkle):
[mediawiki/core@master] Add Cache-Control functions to REST API Handler class

https://gerrit.wikimedia.org/r/632356

Posted a really messy patch to illustrate what I'm suggesting. Don't worry too much about the details of the code yet, I'm just curious if this direction is worth pursuing.

T256395#6298796 has an inventory of affected endpoints that I could find back in July. (Probably worth redoing now.) That suggested the action API logic might not be working properly.

I'd really like to see one single method being used consistently everywhere, instead of various parts of MediaWiki reinventing the logic on their own, given the investigation above suggests some obvious-looking approaches might not be fully reliable. Providing a convenience wrapper for it in RestHandler makes sense, but in the end the REST API, the action API, RawAction etc. should all use the same helper method.

I'd really like to see one single method being used consistently everywhere, instead of various parts of MediaWiki reinventing the logic on their own, given the investigation above suggests some obvious-looking approaches might not be fully reliable.

Conceptually, what should this logic be? Making any response for a logged in user non-cacheable seems over-zealous. I'd like to understand under what circumstances it is insufficient.

In my mind, for responses that would "generally" be cacheable, the following logic would apply:

  • it's not cacheable if any per-user permission checks (or rate limits) were applied
  • it's not cacheable if the output depends on any user settings

Am I missing something?

Are we setting cache response headers differently based on whether the user is authorised via OAuth or cookie?

Are we setting cache response headers differently based on whether the user is authorised via OAuth or cookie?

No, for REST API we definitely don't.

There's 2 issues here:

  1. Caching non-public content - that's what Daniel's proposal of collecting and checking permissions is about. This is orthogonal to authentication method.
  2. Caching a cookie. This is a real big problem here. Bill's patch will likely fix it for now, since it basically drops caching for authenticated requests, but perhaps we need even more protection of WebResponse level? Cause it is not impossible that a Set-Cookie will be issued together with a response that did not do any permission checks at all, and once/if we switch to collecting permission checks we can easily break this again.

Here's the log of action API requests which HeaderCallback had to fix. It's pretty clear the logic in ApiMain::setCacheMode does not work. I think this is hard enough to get right that we should avoid every component reimplementing the logic.

IMO the dream solution here would be to have some means of disabling sessions for endpoints (like MW_NO_SESSION does) and only allow public caching for those endpoints. That avoids Set-Cookie headers in a robust way (in T264378: ATS-BE Lua mitigations for cacheable responses w/ Set-Cookie seemingly not working it is being speculated that even emitting Cache-Control: public and Set-Cookie in two separate requests for the same URL might be problematic), avoids any dependency of the content on the user's identity (which is not trivial, things like revdel access checks can happen pretty deep down), and probably means a nontrivial performance speedup as well for fast endpoints. But, it would have to be done very early in the setup phase, where the router is normally not yet available; and I'm not sure how it would mesh with OAuth client keys.

(in T264378: ATS-BE Lua mitigations for cacheable responses w/ Set-Cookie seemingly not working it is being speculated that even emitting Cache-Control: public and Set-Cookie in two separate requests for the same URL might be problematic)

Just to be clear: the speculation is only about false positives being logged, not about cookies being cached or returned to the wrong user.

Change 632800 had a related patch set uploaded (by BPirkle; owner: BPirkle):
[mediawiki/core@master] Inject a permission manager and logger into the Core REST API Handler

https://gerrit.wikimedia.org/r/632800

Removing myself as assignee, as I am not actively working on this.

Change 861391 had a related patch set uploaded (by TK-999; author: TK-999):

[mediawiki/core@master] Rest: Prevent caching responses for logged-in users

https://gerrit.wikimedia.org/r/861391

Adding Platform Engineering project tag as tasks need to be discoverable and as the current PET Clinic Duty tag got archived in T266534

Change 861391 merged by jenkins-bot:

[mediawiki/core@master] Rest: Prevent caching responses for logged-in users

https://gerrit.wikimedia.org/r/861391