Page MenuHomePhabricator

rest gateway: use rlc from sessionJwt cookie even when a bearer token is used
Closed, ResolvedPublic

Description

Bearer tokens may not have an rlc (rate-limit class) claim. This is especially true for long-term owner-only tokens - long term tokens should not grant access levels, since they can't easily be revoked.

This currently means that there is no way to get elevated rate limits when using bearer tokens to authenticate. Ideally, clients would switch to a refresh-token flow, but as a short term solution, we will just set a sessionJwt cookie for clients that authenticate using a bearer token. (T417833).

To make this work, the rest gateway needs to examine both tokens (the bearer token an the one in the cookie) and compare/combine the information from them.

Event Timeline

Change #1241581 had a related patch set uploaded (by Daniel Kinzler; author: Daniel Kinzler):

[operations/deployment-charts@master] rest-gateway: use rlc claim from cookie with bearer token

https://gerrit.wikimedia.org/r/1241581

The simplest version would be to just ignore the header when the cookie is present. That is not ideal, but can be made work if we ensure on the MediaWiki side that requests with a mismatching bearer token and cookie get rejected. The reason that's not ideal is that the cookie might not really be under the control of the client, in case of JS-based clients (which are not used much, and until recently, have been implemented weirdly (T323867: Clarify use of non-confidential OAuth 2.0 clients) but they do exist), and the cookie name is shared with normal cookie-based sessions, so an OAuth client might accidentally end up sending a cookie that breaks it. (We could use a different cookie name, but I imagine that would complicate things in Envoy in a different direction.)

In most normal scenarios this won't be a problem because 1) having both normal cookies and OAuth headers is rare (OAuth 1 cannot be used on the client-side, OAuth 2 is not used much, it's used even less on the client-side, and it isn't really practical for a client-side OAuth 2 app to use an owner-only consumer since using the authorization code flow would be much simpler) and 2) as long as the OAuth request and the normal cookie-based session is for the same user, they would end up creating the same cookie anyway, so it will be compatible with both.

The more ideal version would probably treat the request as untrusted if the sub field of the cookie JWT and the access token differs, and prefer the rlc from the access token if they have the same subject. (Owner-only access tokens won't have an rlc field. For non-owner-only access tokens, in the future the rlc might be more permissive compared to a cookie, due to T409305: Replace API Gateway rate limit overrides with rate limit classes.)

treat the request as untrusted if the sub field of the cookie JWT and the access token differs

The format of this field is not quite the same, right? We'd have to parse it to extract the user ID?

In what situation could these two values differ? How could it be abused?

For non-owner-only access tokens, in the future the rlc might be more permissive compared to a cookie, due to T409305: API tokens: use rate limit classes instead of rate limit overrides..)

That makes things rather complicated... unless we say we only use the rlc from the cookie if the bearer token doesn't have one. It would be tricky to determine which class is more permissive in Envoy. Not entirely impossible, but it would add quite a bit of complexity.

The format of this field is not quite the same, right? We'd have to parse it to extract the user ID?

It's the same as normal access tokens, but not the same as owner-only access tokens (which are forever valid so we are bound to whatever was introduced ten years ago).

Owner-only access tokens use the central user ID as sub. Normal access tokens and JWT cookies use mw:CentralAuth::<central user ID>.

In what situation could these two values differ?

When somebody uses an OAuth app that runs in the browser (gadget / browser extension / TamperMonkey script), on a Wikipedia site (not cross-domain - for those you won't get cookies at all), and they use the OAuth app with a different user identity from what they are logged in with on the site (a bot account maybe). That's very implausible but technically possible.

How could it be abused?

As long as the traffic layer only uses JWTs for rate limiting, and MediaWiki only uses them for invalidating requests, not much. I guess you could spawn throwaway user accounts and have your request authenticate as your non-throwaway account, but use the quotas of the throwaway ones, or the other way around. I don't think it's worth worrying about.

What's more likely (although still very unlikely) is someone accidentally ending up with a mismatching header/cookie when using an OAuth client that runs in the browser, and getting logged out and/or their OAuth requests rejected because the JWT cookie does not match the other parts of the request.

That makes things rather complicated... unless we say we only use the rlc from the cookie if the bearer token doesn't have one. It would be tricky to determine which class is more permissive in Envoy. Not entirely impossible, but it would add quite a bit of complexity.

I think it'd be fine to always prioritize the bearer token's rlc when it exists. Owner-only tokens don't have one, normal bearer tokens with short expiry don't need a cookie (and won't set one, but you can end up with one anyway with browser-based clients like above).

I think it'd be fine to always prioritize the bearer token's rlc when it exists. Owner-only tokens don't have one, normal bearer tokens with short expiry don't need a cookie (and won't set one, but you can end up with one anyway with browser-based clients like above).

I still don't quite understand the issue with always prioritizing the cookie, if it's there. Isn't that the more reliable source of information? E.g. if someone uses a JS app in the browser, why would it be a problem if we based the rate limit on the user's session rather than the oauth token?

If we prioritize the bearer token, it seems to be that there would be a higher chance of using stale information...

What does MediaWiki base its session handling on when there is both a session cookie and a bearer token? Which one takes precedence?

OAuthRateLimiter sets higher rate limits for select applications, and it's based on consumer ID, not user ID, so it cannot be reproduced in the normal session cookie. (It can be reproduced in the OAuth session cookie, but with cookies you can never be sure which one you get.) That said, there are like five such applications and they are all internal or use OAuth 1, so I'm probably overcomplicating things here and this is an academic concern only.

Wrt staleness, JWT cookies and non-owner-only access tokens have similar expiry times. We could update the cookie proactively when we see that something in it is incorrect (but don't actualy do that today) so theoretically it's even more up-to-date, but both of them are fairly fresh. And owner-only access tokens don't have rlc anyway.

What does MediaWiki base its session handling on when there is both a session cookie and a bearer token?

The bearer token always gets priority. But Envoy ignores the real authentication cookies, it only looks at the JWT cookie, and there is no way to tell whether that is coming from the cookie-based session handling or OAuth-based session handling (once that one also starts setting cookies).
I guess we could just have a session type field in the JWT, that could make logs/debugging less confusing at the very least.

These are the possible authentication types (once we finish the pending bot pw / OAuth work):

session typeJWT cookieAuthorization header
normal web-basedyesno
bot passwordyesno
OAuth 1yesyes, but not a JWT
OAuth 2 owner-onlyyesyes, but no rlc field
OAuth 2 (normal)noyes

So in all these scenarios there is only one rlc source. The only reason to end up with two contradictory rlcs is that cookies are hard to control, especially in a web browser. So you could have a normal OAuth 2 app that runs in the browser within Wikimedia pages (maybe as a gadget or browser plugin), and signs its requests with an access token, but doesn't explicitly opt out of cookies, so the JWT cookie (for the normal web-based session, probably) in the browser's cookie jar also gets sent.

In that scenario the OAuth header is always going to be more reliable, so if you see both types of JWTs *and* both of them have an rlc field, it's better to prioritize the Authorization token. But this situation is very unlikely to happen in practice.

Change #1241581 merged by jenkins-bot:

[operations/deployment-charts@master] rest-gateway: use rlc claim from cookie with bearer token

https://gerrit.wikimedia.org/r/1241581