Page MenuHomePhabricator

Define best practice for single-user apps which need a high MediaWiki API rate limit
Closed, ResolvedPublic

Description

T398815: WE5.1.2 Verifiable MediaWiki sessions will introduce lower API rate limits for untrusted users, with trusted users being identified by verifiable session tokens (either a JWT cookie or an OAuth 2 access token). We want these tokens to expire quickly as otherwise they are very easy to circumvent. For gadgets and such (which will automatically benefit from the JWT cookie set as part of normal cookie-based session management) and multi-user OAuth 2 applications, this will work fine. For OAuth 1 apps and bot password based clients, we'll issue JWT cookies as a B/C mechanism but we'll recommend developers to move away from these authentication mechanisms - high-traffic apps should all use OAuth 2, as is standard these days.

What's not entirely clear is what single-user apps should do. Today, the best practice for that use case is an owner-only OAuth 2 app. But owner-only apps use an access token with infinite expiry, which we don't want for high-rate-limit clients. Should they use a normal OAuth 2 app with the Client Credentials grant? Or should there be some mechanism to use owner-only apps with a refresh token and short-lived access tokens? Also, how should this be explained on the OAuth 2 app registration interface?

(We'll issue JWT cookies as a B/C mechanism for OAuth 2 owner-only apps as well, but don't want to rely on that as a permanent measure.)

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Should they use a normal OAuth 2 app with the Client Credentials grant?

Client credentials currently result in an anonymous session, making them useless for most purposes: T417278: Choosing client credentials grant for OAuth 2 results in an access token (JWT) with the 'sub' field empty
Whether that's a bug or intended behavior is debatable, but it's a blocker for this type of use case.

Or should there be some mechanism to use owner-only apps with a refresh token and short-lived access tokens?

One thing we could do is to essentially replace the access token with the refresh token: instead of owner-only consumers giving you a never-expiring access token, they would give you a never-expiring refresh token, which you'd periodically use to get a short-lived access token. But this has several problems:

We should audit the full list of differences between owner-only and normal code paths, it would give us a better idea of what features need to be recreated in whatever we use for the new recommended single-user OAuth mechanism.

For OAuth 1 apps and bot password based clients, we made the intentional choice of not supporting higher rate limits - high-traffic apps should all use OAuth 2, as is standard these days.

This seems like an arbitrary choice that's to the detriment of most existing users.

Bot passwords already have a session cookie (and require it for sessions to function), so if you can support your edge rate limiting with normal session cookies there should be no reason you can't support it with bot passwords too.

The OAuth SessionProvider makes use of a session cookie optional, and AFAICT WMF currently does not enable it, but you could enable it. Then, again, your rate limiting should be able to work just like it does with normal session cookies.

Client credentials currently result in an anonymous session, making them useless for most purposes: T417278: Choosing client credentials grant for OAuth 2 results in an access token (JWT) with the 'sub' field empty
Whether that's a bug or intended behavior is debatable, but it's a blocker for this type of use case.

I believe that's the intended behavior of the OAuth spec. Client credentials are supposed to be for the client itself, not any user of the site, and therefore aren't associated with any user of the site.

For OAuth 1 apps and bot password based clients, we made the intentional choice of not supporting higher rate limits - high-traffic apps should all use OAuth 2, as is standard these days.

This seems like an arbitrary choice that's to the detriment of most existing users.

It's certainly not arbitrary - it's based on the various issues with relying on long-lived tokens (easy to abuse, gets out of sync, makes our code harder to maintain). But yeah it's a trade-off against client developer convenience.

Bot passwords already have a session cookie (and require it for sessions to function), so if you can support your edge rate limiting with normal session cookies there should be no reason you can't support it with bot passwords too.

The OAuth SessionProvider makes use of a session cookie optional, and AFAICT WMF currently does not enable it, but you could enable it. Then, again, your rate limiting should be able to work just like it does with normal session cookies.

Session cookies specifically are not useful here, since they are just pointers to a storage backend and the edge infrastructure can't afford store lookups. But yeah we can use some kind of cookie, and that's the current plan:
T415007: Login with `action=login` and bot password does not create a JWT session cookie
T417833: Set a JWT cookie for OAuth 1 requests and OAuth 2 owner-only requests

Using a cookie is also a DX tradeoff, although less so (using a cookie jar is simpler than using an OAuth 2 client). It's also a bit less reliable (OAuth 2 is complex enough that no one rolls their own implementation, but people do implement cookie handling manually and then you end up with clients that e.g. ignore cookie expiry times), and either we use lots of different cookies and then the edge infrastructure needs some complex conflict resolution logic and needs to know a lot about MediaWiki internals (times two, because we have two independent layers outside MediaWiki doing rate limiting), or we use the same cookie name for all session types and then you get all sorta weird behavior with one session affecting the other.

So IMO cookies are more a workaround than a proper solution, and we still want to get to the point where we can say "all you need is an OAuth 2 library".

Client credentials currently result in an anonymous session, making them useless for most purposes: T417278: Choosing client credentials grant for OAuth 2 results in an access token (JWT) with the 'sub' field empty
Whether that's a bug or intended behavior is debatable, but it's a blocker for this type of use case.

I believe that's the intended behavior of the OAuth spec. Client credentials are supposed to be for the client itself, not any user of the site, and therefore aren't associated with any user of the site.

The spec is very open-ended on this:

The client can request an access token using only its client credentials (or other supported means of authentication) when the client is requesting access to the protected resources under its control, or those of another resource owner that have been previously arranged with the authorization server (the method of which is beyond the scope of this specification).

So the user registering the client credentials being the "other resource owner" would be within the letter of the rules. It might still be confusing or surprising to client developers though.

After discussing this, the current plan is to fix T417278: Choosing client credentials grant for OAuth 2 results in an access token (JWT) with the 'sub' field empty so that access tokens obtained via the client credentials flow are bound to the app owner's user account, and behave the same way as other kinds of access tokens, and then recommend client credentials as the replacement for owner-only tokens. Exchaning client credentials for an access token is just a straightforward API request, so even without using an OAuth 2 library it is not too hard to do; and decent OAuth 2 libraries will hide the interaction entirely.

It's certainly not arbitrary - it's based on the various issues with relying on long-lived tokens (easy to abuse, gets out of sync, makes our code harder to maintain). But yeah it's a trade-off against client developer convenience.

"Long-lived tokens" aren't any easier to abuse than "long-lived passwords". To me, this just looks like people are still looking for excuses to push OAuth 2 and are jumping on anything that sounds "good".

Session cookies specifically are not useful here,

You're the one who said earlier that gadgets are fine because they use the session cookie.

Using a cookie is also a DX tradeoff, although less so (using a cookie jar is simpler than using an OAuth 2 client). It's also a bit less reliable (OAuth 2 is complex enough that no one rolls their own implementation, but people do implement cookie handling manually and then you end up with clients that e.g. ignore cookie expiry times),

Sounds like FUD to me.

and we still want to get to the point where we can say "all you need is an OAuth 2 library".

"An OAuth 2 library" that happens to implement the specific flavor of OAuth 2 that the MediaWiki extension happens to use, maybe. Have one for every language people might want to use to run a bot?

So the user registering the client credentials being the "other resource owner" would be within the letter of the rules. It might still be confusing or surprising to client developers though.

That does seem likely.

@Tgr: I think the task description is somewhat outdated, and fixing it will eliviate Anomie's concern. The task description says:

For OAuth 1 apps and bot password based clients, we made the intentional choice of not supporting higher rate limits - high-traffic apps should all use OAuth 2, as is standard these days.

This is not longer true per T417833 and T415007: as long as the sessionJwt cookie is sent, higher rate limits are indeed supported.

We still want a clear recommendation for which auth method we recommend in what situation, and especially what Oauth2 flow should be used instead of permanent tokens. But elevated rate limits are possible with all auth methods, as long as the client sends cookies.

"Long-lived tokens" aren't any easier to abuse than "long-lived passwords".

That's only true if the token can be revoked, which it only can if we look it up in a database, which would not be feasible to to in the gateway (or at the network edge). If we trust permanent JWTs there, it becomes impossible to block users from excessive API usage. With short-lived tokens, the block still takes a while until the token has expired, but at least it works eventually.

Using permanent JWTs defeats the purpose of JWTs, namely that you can trust their content without the need to check against a database. That's not just because tokens may be leaked, but also because the privileged associated with a user account may change over time in ways that the user doesn't like.

To me, this just looks like people are still looking for excuses to push OAuth 2 and are jumping on anything that sounds "good".

OAuth 2 is pretty much the standard by now, so why not push for it? Supporting it is valuable, and supporting several auth methods increases attack surface.
Of course, a full OAuth 2 flow with refresh tokens is more complex to implement. But there are plenty of libraries that will do it for you, once we have settled on the flow to use.

Session cookies specifically are not useful here,

You're the one who said earlier that gadgets are fine because they use the session cookie.

I think there there's a slight confusion here caused by the fact that currently, there are two cookies - the new sessionJwt cookie that can be used by the gateway (and haproxy), and the traditional session cookie used by MediaWiki, which contains the session ID which needs to be looked up. That's an implementation detail that may change.

Using a cookie is also a DX tradeoff, although less so (using a cookie jar is simpler than using an OAuth 2 client).

Sounds like FUD to me.

The consideration is "use a library that implements the complex flow" vs "write your own code for the simpler flow". Which one is "better" depends on the specific case, but in general, it seems wise for security and maintainability to 1) stick to the standard and 2) reduce the number of options. However, forcing clients to migrate to a new protocol always comes at a cost and should be avoided. The cookie based solution will be supported for the foreseeable future. But then, if people have to touch their code anyway, why not recommend they go for the standard solution, rather than the work-around?

"An OAuth 2 library" that happens to implement the specific flavor of OAuth 2 that the MediaWiki extension happens to use, maybe. Have one for every language people might want to use to run a bot?

Figuring out the flavor that best serves the purpose and has wide library support is exactly the purpose of this ticket.

If I may dare summarize: it seems that the best practice we want to recommend is using the client credentials flow, but the current implementation does not allow performing actions as if logged in when using that flow.

I filed a new task about fixing that: T420297: Allow OAuth 2 apps using client credentials flow to perform actions as the app's owner (i.e. be logged in). Let's continue there.

(I don't want to shut off the discussion; we want to recommend that, but it's not decided yet. I think it will be clearer to centralize the discussion in that new task though.)