Create an inventory of current MediaWiki session authentication mechanisms used in Wikimedia production, and check if they can use transparent tokens (ie. if tokens like session ID cookies can be turned into something that services outside MediaWiki can easily obtain information from).
Description
| Status | Subtype | Assigned | Task | ||
|---|---|---|---|---|---|
| Resolved | JTweed-WMF | T392630 [Hypothesis] WE5.5.3 Developer Authentication Exploration | |||
| Resolved | Tgr | T392633 [WE5.5.3 research spike] Inventory of current MediaWiki session authentication mechanisms |
Event Timeline
In the Wikimedia subset of MediaWiki, there are 7 session provider implementations:
- CookieSessionProvider (core)
- BotPasswordSessionProvider (core)
- CentralAuthSessionProvider (MediaWiki-extensions-CentralAuth)
- CentralAuthApiSessionProvider (MediaWiki-extensions-CentralAuth)
- CentralAuthHeaderSessionProvider (MediaWiki-extensions-CentralAuth)
- SessionProvider (MediaWiki-extensions-OAuth)
- NetworkSessionProviderUsers (NetworkSession)
(There's also InstallerSessionProvider but it's not relevant.)
CookieSessionProvider and CentralAuthSessionProvider both provide two different authentication mechanisms (the session cookie and the user token cookie). CentralAuthApiSessionProvider and CentralAuthHeaderSessionProvider are essentially the same thing, they only differ in how the same authentication should be passed (header or GET/POST parameter). The OAuth SessionProvider is the joint entry point for two fairly different authentication protocols (OAuth 1 and 2). So in total we have nine different request authentication mechanisms in MediaWiki.
Core session cookie
The session metadata is stored in the session store. The random part of the session store key is stored on the client in the <wiki-id>Session cookie. (There are a bunch of other cookies, user ID, user name etc. They aren't really relevant.) The session can be anonymous or logged-in.
This mechanism is used on non-SUL wikis, on SUL wikis for anonymous sessions (via CentralAuthSessionProvider inheriting from CookieSessionProvider), and on SUL wikis for logged-in sessions alongside with the similar CentralAuth mechanism (since we require a local session for storing application data in the session).
The user token is stored in the session metadata, so changing the token invalidates all sessions.
(Technically "non-SUL wikis" should be "non-SUL users"; the same mechanisms are used on non-SUL wikis and for non-unified accounts on SUL wikis. But there shouldn't be any of those anymore.)
Core user token
A hash derived from the user.user_token database field is stored on the client in the <wiki-id>Token cookie. If there was no valid session cookie attached, a local session is automatically created and the session cookie set.
The cookie has one-year expiry but in practice is valid until the token is changed (via some kind of security intervention such as password change or invalidateUserSessions.php), and some non-browser-based tools take advantage of this.
Only used for logged-in sessions on non-SUL wikis, and only when the "keep me logged in" checkbox was checked during login.
Bot password cookie
This is pretty much identical to logged-in core sessions without "keep me logged in", except the cookie is called <wiki>_BPsession. Initiated via the action=login API (with a bot password), rather than normal login.
Unlike normal sessions, can be associated with grants (and so can limit permissions).
CentralAuth session cookie
The session metadata is stored in the session store (under a different namespace from core sessions - CentralAuthSessionProvider basically reimplements both CookieSessionProvider and a few parts of SessionManager / SessionBackend). The nondeterministic part of the store key is stored in the centralauth_Session cookie. (There is another cookie that stores the username; it's mostly just for double-checking things.) The cookie is set on the parent domain (except for wikimedia.org subdomains) so it's shared between all language versions of the same project. Also between the mobile and desktop version.
Like with core, the central user token is stored in the central session metadata, so changing the token invalidates all sessions.
Used for logged-in sessions on SUL wikis, alongside the core mechanism (the local session is automatically created when the central session exists but the local one doesn't).
The central session ID is also stored in the local session metadata, besides the cookie. (TODO: what exactly is this used for?)
CentralAuth user token
Like the core user token, but uses the central user table (globaluser.gu_auth_token; in plaintext :( ) and the parent domain for the cookie (centralauth_Token).
A notable difference from core is that the token changes on logout (so the user is logged out from all domains and devices).
CentralAuth API token
These are short-lived (10s) one-time tokens, meant for cross-domain logged-in API calls from a browser. The token is obtained via the action=centralauthtoken API, and used via the centralauthtoken=<token> URL query parameter (for the action API) or the Authorization: CentralAuth <token> HTTP header (for the REST API). Tokens are used to construct keys for the token store (in Wikimedia production, the microstash) where some information about the user is stored. That includes the central user token, so changing it invalidates all API tokens (not that it really matters since they are very short-lived).
OAuth
OAuth credentials are set up via a dedicated management interface. The interface will either directly return session credentials bound to the acting user (owner-only OAuth) or return intermediary credentials which can be exchanged for session credentials for any user, via the handshake protocol determined in the OAuth spec, which involves that user going through an approval dialog.
Like bot passwords, OAuth sessions are associated with grants.
OAuth 1.0
The client application has four credentials (consumer token, consumer secret, user token, user secret). The two tokens are provided directly in the request (using a variety of mechanisms: part of an Authorization HTTP header, an URL query parameter, a POST parameter); the secrets are used together with the request data itself to calculate a per-request signature, which is attached to the request the same way as the token, alongside with a few supporting pieces of information (a timestamp, a nonce etc).
The four credentials are valid until revoked (by the user revoking the permission they gave to the application.) They are verified on every request.
OAuth 2.0
The client application has an access token. Requests are identified by the HTTP header Authorization: Bearer <access token>. Owner-only access tokens are valid forever; other access tokens are relatively short-lived (4 hours for Wikimedia wikis). The client can also have a refresh token, which is long-lived (1 year for Wikimedia wikis) and can be exchanged for a new access token at any time via an API call (that doesn't require an authenticated session).
Tokens can become invalid abruptly, when the user revokes the permission they gave to the application. That's verified on every request.
The OAuth spec doesn't restrict the format of the access token; it's intended to be opaque to the client/user. Wikimedia's implementation follows the common practice of using a JWT that includes basic information about the user, and some rate limiting information. This is used outside MediaWiki, in the API gateway (see T392647#10802093).
NetworkSession
Tokens are arbitrary secrets defined in site configuration, sent in the Authorization header. This type of authenticatoin is only used for internal server-to-server requests which go through the service mesh rather than the normal traffic routing for external requests.
Ability to use arbitrary tokens:
- Session-ish cookies (core session cookie, core user token cookie, CentralAuth session cookie, CentralAuth user token cookie, bot password cookie): yes, could be replaced with any value as long as the session ID / user token / user token hash is recoverable from it (e.g. a JWT with the ID/hash + other data). The JWT could also be a separate cookie, we are already using several cookies (session ID, username, user ID...).
- CentralAuth API token: yes, this is an arbitrary token, fully under our control, and short-lived so easy to change the format.
- NetworkSession: yes, tokens are arbitrary secrets stored in site confiugration.
- OAuth 2: the tokens are already JWTs and there's a hook (OAuthClaimStoreGetClaims) for adding fields to the JWT data, so no change needed.
That leaves OAuth 1, which seems to be the only tricky case. The spec doesn't prescribe anything about access tokens so no reason they couldn't be JWTs, but some refactoring would be needed:
- Access tokens are generated in Consumer::saveAuthorization() which is easy to change.
- Access tokens are stored in the oauth_accepted_consumer.oaac_access_token DB field which is a 32-character string so that would require a schema change to store JWTs which will be much longer.
- Access tokens never expire which makes their usefulness limited as JWTs. The spec does allow for expiry, so we could change that but it's a B/C break - at that point just asking people to use OAuth 2 might not be much more effort. (Although many OAuth 1 based tools use the /authenticate endpoint which can get a new access token every time without user interaction, so for those tools it would probably be fine.)
Alternatively, we could keep the DB as is, and convert from DB string to a JWT on the fly. e.g. in ConsumerAcceptance::loadFromRow() or getAccessToken() (and un-convert in ConsumerAcceptance::newFromToken()). Since the access token has to match exactly, this would mean it would have to be a deterministic function of the DB value and would not be very useful for rate limiting etc, but we could at least store a user ID in it.