Page MenuHomePhabricator

Add OAuth 2.0 support to MediaWiki REST API
Closed, ResolvedPublic

Description

In this stage, we will use OAuth 2.0 as an authorization mechanism for the MediaWiki REST API and the primary one for Wikimedia sites.

Note: "client ID" is another word for "API key".

Personas:

Developer - a software developer that uses the MediaWiki REST API on their own behalf or on behalf of users
User - a person who reads, contributes to, curates or administrates a MediaWiki

Related Objects

Event Timeline

eprodromou set the point value for this task to 55.

There's an OAuth extension (which only handles 1.0 currently) but if OAuth 2.0 is going to be "the primary authorization mechanism for the MediaWiki REST API" as the task description says, I imagine it has to happen within MediaWiki core?

@Tgr I updated the ticket. I think we want to move to OAuth 2.0 within Wikimedia, but I also think that implementing as an extension isn't out of the question.

There's a top-level initiative to build an OAuth 2.0 extension at T229500. I think it's separate from the OAuth 1.0 one, but I believe there's some talk about merging it... I'm not directly involved, obvs.

@Tgr let me know if the wording is clearer.

eprodromou updated the task description. (Show Details)
eprodromou removed the point value for this task.
eprodromou added a subscriber: EvanProdromou.

I read RFC 6749 and have some observations.

There are two sorts of client that have traditionally not used OAuth, so if OAuth 2.0 is going to be required for all clients, then we have to figure out what to do about them.

Internal JavaScript

The first sort of client is JS code internal to MediaWiki, delivered by MediaWiki core or an extension and run in the browser while a local page is viewed. For example, VisualEditor is able to perform requests to the existing REST API without any client authentication process. However, write actions from these clients require a CSRF token. I believe that this sort of client is best served in OAuth by a client credentials grant. The request flow for a client credentials grant is essentially identical to a request to the action API with a CSRF token:

  • The client makes a request to the token endpoint with grant_type=client_credentials. No other parameter is required. The application is responsible for authenticating the request, and I believe it's compliant to do this with MW's existing cookie authentication. There is no client_id -- any concept of client identification is unspecified and private to the application. A token is returned in the JSON response.
  • The token is then used to authorize subsequent requests. A "bearer" token is very similar to an existing CSRF token and is much simpler to implement than the MAC authentication used in OAuth 1.0.
  • The "scope" parameter is optional and has the same semantics as "salt" in the existing CSRF protection system. We could follow the action API's example and define scopes named after the "salts" in ApiQueryTokens::getTokenTypeSalts().

Bots without a web server

Bots make various kinds of edits or uploads to our servers, using the username and password configured by the bot operator. They typically do not have a web server or URL associated with them and so have not used OAuth 1.0. For these clients, I think it's correct to use OAuth 2.0 with resource owner password credentials.

The client makes a request to the token endpoint with grant_type=password, username=..., password=.... No client_id is required.

The response includes a refresh token and an access token. The access token is used to make requests, whereas the refresh token is used to permanently reauthenticate. So bot authors can improve their security over the current situation by asking for a username and password only on first run, storing the refresh token for permanent access. The password does not need to be stored.

Consequences and implementation

OAuth 1.0 is implemented in an extension. The extension mostly consists of the UI code needed to handle client registration and authorization. It would be very awkward to merge this extension into core and require client registration for all clients wishing to use the REST API. However, for the two grant types above, no client registration is needed. Approximately zero code is needed from the OAuth extension to support these two grant types in MediaWiki core.

My proposal is to support these two simple grant types (client_credentials and password) in the REST API in core, as a replacement for CSRF tokens. The extremely simple nature of these grant types points to a DIY implementation; we would not use a library for this. Client registration and the two other grant types (authorization code and implicit) would be implemented in the OAuth extension and would share a UI with OAuth 1.0 registration. MW core would provide extension interfaces to the extent necessary to avoid conflicts between the extension and core implementations of OAuth.

We could have OAuth 2.0 authorization and token endpoints implemented in the core REST API. The OAuth extension would extend these core endpoints by providing additional grant types, there would not be separate OAuth 2.0 endpoints in the extension. If the OAuth extension is not installed, a client requesting to use grant_type=authorization_code would receive an error message.

This comment was removed by tstarling.

Internal JavaScript

The first sort of client is JS code internal to MediaWiki, delivered by MediaWiki core or an extension and run in the browser while a local page is viewed. For example, VisualEditor is able to perform requests to the existing REST API without any client authentication process. However, write actions from these clients require a CSRF token. I believe that this sort of client is best served in OAuth by a client credentials grant. The request flow for a client credentials grant is essentially identical to a request to the action API with a CSRF token:

  • The client makes a request to the token endpoint with grant_type=client_credentials. No other parameter is required. The application is responsible for authenticating the request, and I believe it's compliant to do this with MW's existing cookie authentication. There is no client_id -- any concept of client identification is unspecified and private to the application. A token is returned in the JSON response.
  • The token is then used to authorize subsequent requests. A "bearer" token is very similar to an existing CSRF token and is much simpler to implement than the MAC authentication used in OAuth 1.0.
  • The "scope" parameter is optional and has the same semantics as "salt" in the existing CSRF protection system. We could follow the action API's example and define scopes named after the "salts" in ApiQueryTokens::getTokenTypeSalts().

We'd probably need to register an "internal" consumer with the OAuth extension for it to be able to use for generating the token. We'd also have to deal with the violation of the traditional separation where MediaWiki core doesn't depend on extensions somehow, if nothing else by implementing a very minimal version of OAuth 2 for this internal consumer in core (separate from the OAuth 2 in the OAuth extension that's usable by external clients).

An alternative would be to use the refresh_token flow, embedding the refresh token in the JavaScript instead of the client credentials. But it doesn't really matter, as either way we'd have to consider someone extracting the credentials/token from the source and using them in some malicious app as mentioned already in T221161#5131809. They'd have to somehow be short-lived enough to mitigate that while still being long-lived enough to not have annoying timeouts for legitimate users accessing the page.

Bots without a web server

Bots make various kinds of edits or uploads to our servers, using the username and password configured by the bot operator. They typically do not have a web server or URL associated with them and so have not used OAuth 1.0.

We implemented owner-only consumers specifically so such bots can use our OAuth 1.0a. The basic idea is that, instead of the client performing the usual request-for-approvals process, the web UI displays the access token+secret that would be the result of the request and that is configured directly into the bot.

For OAuth 2.0, we might have the possibility for the same thing, or we might have them use the client_credentials flow.

For these clients, I think it's correct to use OAuth 2.0 with resource owner password credentials.

Note, though, that MediaWiki may no longer have a password, or may require 2FA or other things in addition to just a username and password.

In reviews for the current OAuth 2.0 project, I recommended against even trying to support the password flow since it's likely to be extremely fragile.

We'd probably need to register an "internal" consumer with the OAuth extension for it to be able to use for generating the token. We'd also have to deal with the violation of the traditional separation where MediaWiki core doesn't depend on extensions somehow, if nothing else by implementing a very minimal version of OAuth 2 for this internal consumer in core (separate from the OAuth 2 in the OAuth extension that's usable by external clients).

You're putting the split between core and extension at a slightly different place to me. I'm going to flesh out my proposal a bit more since I think that will help us decide how to do this.

Concepts:

  • OAuth 2.0 "scope" is mapped directly to MW core's concept of "grants".
  • Access tokens are opaque to the client, but to MW, they are namespaced, e.g. access_token=eo.12345678 for a token issued by Extension:OAuth.

Core will provide the following facilities:

  • An authorization endpoint in the REST API
    • A response_type registry, empty by default.
    • Requests for known response_type values will be forwarded to the registered handler, perhaps passing a Rest\RequestInterface and returning a Rest\ResponseInterface.
  • A token endpoint in the REST API
    • A grant_type registry, similar to the response_type registry.
  • A grant_type=client_credentials module
    • A requested scope is allowed if it is a valid grant, per the existing core concept of grants.
  • A SessionProvider
    • Similar to the existing MWOAuthSessionProvider, but activates itself on "Authorization: Bearer" instead of "Authorization: OAuth"
    • A registry of access token namespace modules.
    • The session provider will delegate method calls like getAllowedUserRights() to the access token namespace module.
  • A token namespace module for client_credentials

Extension:OAuth will provide:

  • A response_type=code module.
  • A grant_type=authorization_code module.
  • A grant_type=refresh_token module.
  • A token namespace module.

It's complicated, but if we merged the OAuth extension into core, or implemented OAuth 2.0 solely in Extension:OAuth, the structure would still be similar. You'd just be able to simplify the registries by turning them into switch statements. I'm not wedded to this core/extension split, but if we want to use OAuth 2.0 in the core REST API as a replacement for CSRF tokens, then it seems like we either need this split or we need to merge the whole extension into core.

An alternative would be to use the refresh_token flow, embedding the refresh token in the JavaScript instead of the client credentials. But it doesn't really matter, as either way we'd have to consider someone extracting the credentials/token from the source and using them in some malicious app as mentioned already in T221161#5131809. They'd have to somehow be short-lived enough to mitigate that while still being long-lived enough to not have annoying timeouts for legitimate users accessing the page.

I just want to confirm you're not confusing refresh_token with client_id. A refresh_token authorizes a particular user+app combination and is private data. A client_id identifies an app and is public data. Embedding a refresh_token in a page has approximately the same security characteristics as embedding a session ID. You could put a refresh_token on a page, as long as the page requires authorization to view and is uncacheable. If a malicious app can steal a refresh_token from such a context, it can presumably steal session cookies or whatever. You definitely can't commit a refresh_token to a public git repository.

I don't see the need to put a refresh token on a page. Say if you had an internal web UI which performed a write action, like editing. Putting an access token on the page would avoid the need for an additional request, since the access token could immediately be used to make write requests to the REST API. If you only put a refresh token on the page, you would need to make a request to the token endpoint to get an access token before making such a REST request, so performance would be equivalent to using client_credentials. Refresh tokens are long-lived, which seems unnecessary and undesirable for uncacheable, private page views.

In reviews for the current OAuth 2.0 project, I recommended against even trying to support the password flow since it's likely to be extremely fragile.

Fair point, let's not have grant_type=password.

We'd probably need to register an "internal" consumer with the OAuth extension for it to be able to use for generating the token. We'd also have to deal with the violation of the traditional separation where MediaWiki core doesn't depend on extensions somehow, if nothing else by implementing a very minimal version of OAuth 2 for this internal consumer in core (separate from the OAuth 2 in the OAuth extension that's usable by external clients).

You're putting the split between core and extension at a slightly different place to me. I'm going to flesh out my proposal a bit more since I think that will help us decide how to do this.

Note that a full implementation of OAuth 2.0 in Extension:OAuth is already nearly done.

  • A grant_type=client_credentials module
    • A requested scope is allowed if it is a valid grant, per the existing core concept of grants.

How would this interact with Extension:OAuth also wanting to have support for grant_type=client_credentials for its registered consumers? Multiple backends, with the module going with the first backend that responds successfully to the passed client ID+secret?

Other than that, the proposal makes sense to me.

An alternative might be to have two separate implementations: separate authorization endpoints, separate token endpoints, and separate SessionProviders (there's nothing stopping multiple providers from looking for Authorization: Bearer, SessionManager will go with whichever one returns a SessionInfo with the highest "priority"). The core endpoints would be documented as internal, for use by frontend JS only. All external clients would use the extension's endpoints. I don't have any preference for this implementation over the one you proposed, I'm just noting it as a possibility.

I just want to confirm you're not confusing refresh_token with client_id. A refresh_token authorizes a particular user+app combination and is private data. A client_id identifies an app and is public data. Embedding a refresh_token in a page has approximately the same security characteristics as embedding a session ID. You could put a refresh_token on a page, as long as the page requires authorization to view and is uncacheable. If a malicious app can steal a refresh_token from such a context, it can presumably steal session cookies or whatever. You definitely can't commit a refresh_token to a public git repository.

Embedding a client ID+secret in the page seems to have the same security characteristics as well, as it's one simple POST request to turn that into a refresh token (and access token) without user interaction.

Or did you have some other plan in mind for getting the client ID+secret to the frontend JS in order for it to be able to make the client_credentials call, where providing a refresh token instead wouldn't work in the same way?

I don't see the need to put a refresh token on a page. Say if you had an internal web UI which performed a write action, like editing. Putting an access token on the page would avoid the need for an additional request, since the access token could immediately be used to make write requests to the REST API.

True. And if we go that route, we're past really using OAuth 2.0 at all. We could just as well have a SessionProvider looking for "Authorization: MediaWiki $token", or "MediaWiki-Auth: $token". Which itself is basically just a version of "Cookie: foowiki_session=$token" that isn't subject to browser-based CSRF.

But that would violate the strong push for OAuth 2.0 as the only method for using the REST API. I've yet to see or hear a good explanation for that. "API keys"[1] could be as easily done with a "MediaWiki-API-Key" header. CSRF for the REST API could be done similarly, or with a parameter to the handler as is done for Action API modules.

[1]: For that matter, I've yet to see or hear a particularly good use case for required API keys either. "So we can sell SLAs to $BIGCORP" only needs optional keys, if it can't just be done by source IP instead. "So we can block misbehaving apps" would require the API key be kept secret, which as we know many clients can't so a malicious app could just steal a key from one of them (and a naive developer might well do so accidentally). Or if we have every instance of such clients somehow obtain an instance key to keep in some variety of local storage, how would we prevent a malicious app from doing the same?