Page MenuHomePhabricator

Decide how to expose session information outside of MediaWiki
Closed, ResolvedPublic

Description

To limit the impacts of scraping, we want to empower infrastructure layers in front of MediaWiki (e.g. Varnish or Envoy) to make throttling decisions that take into account the identity of the client (e.g. is it a trusted bot?). This requires being able to partially interpret session tokens used by MediaWiki, before MediaWiki processes the request.
(Partially as in, we need to be able to verify that a given session token was valid in the recent past and which used it belonged to. We don't necessarily need to be able to verify that it is still valid.)
For an overview of MediaWiki session tokens, see T392633: [WE5.5.3 research spike] Inventory of current MediaWiki session authentication mechanisms.

There are multiple ways in which this could be done:

Session info endpoint

Create a session info endpoint in MediaWiki. This could be a REST API, or a PHP entry point for a slight speedup. It runs Setup.php (which, among other things, sets up SessionManager and uses it to identify the user based on the session providers various MediaWiki extensions have registered, and autocreates the user if needed), and returns basic information about the user (central user ID, whether they have the highratelimit right, whether the request is performed through an OAuth app etc).

Infrastructure layers before MediaWiki (e.g. Varnish or Envoy) can fork an incoming request to this endpoint to obtain information relevant for throttling. Since MediaWiki request initialization is relatively expensive, this cannot be done for every request, and it would be that layer's responsibility to persist that information during subsequent requests from that client, e.g. in the form of an edge-managed session cookie.

Strengths:

  • Minimal amount of work (at least on the MediaWiki side), could be done with <100 lines of code.
  • Single solution that is completely decoupled from session handling options / changes in MediaWiki.

Weaknesses:

  • Slight performance hit for new sessions (Setup.php is 10% of the processing time for the average REST API request)
  • Too expensive to call on every request, so only useful to the extent the edge can identify requests as coming from the same client. E.g. for a scraping botnet that uses a very large, nonspecific IP pool and a nonspecific user agent, it wouldn't really help.
  • Whatever persistence mechanism is used outside of MediaWiki can get out of sync. E.g. the user logs out and logs back in as a different user, but the old edge-managed session cookie is still present so the edge layer sees no reason to invoke the session info endpoint again, and still sees the previous user identity.
  • Probably requires all clients to honor Set-Cookie headers, which might not be a given.

Turn all session tokens into JWTs

OAuth 2 session tokens (the access tokens that the client needs to provide in the Authorization header) are JWTs which can be decrypted by any part of the infrastructure that has access to the OAuth public key, and contain some user information (expiry, central user ID, rate limits etc). There is no way to fake these tokens without the private key. All the other session mechanisms use random tokens which are opaque to the user; these tokens could be given a similar structure.

Strengths:

  • Various parts of the infrastructure could interpret and partially validate session tokens on their own, without any dependency on MediaWiki or a central service. (Partially as in, it couldn't account for invalidation mechanisms for once-valid tokens, such as logout or the revocation of OAuth authorization.)
  • Performance hit is minimal (just the cost of decrypting a JWT).
  • Reuses (sort of) an existing mechanism - this is already done by the API gateway for OAuth 2.
  • Standardizing session token structure seems like a good evolution for MediaWiki core generally, easy to make use of this capability in non-Wikimedia installations.
  • Standardizing session token structure could allow treating non-MediaWiki-centric applications (e.g. Gerrit) the same way at the edge as long as their authentication mechanisms can be configured to output a similar token.
  • Conceptually decouples session creation (you need the JWT private key) and session validation (you only need the public key), which is a big step towards T391784: Gradually isolate mediawiki authentication code and infrastructure (ie. limiting the extent of damage an attacker can do with a short-lived remote code execution exploit in MediaWiki).

Weaknesses:

  • Significant amount of work. There are nine different kinds of session tokens currently used by MediaWiki; we might introduce new ones in the future. For some of them the change might be nontrivial (e.g. require a DB schema change because the token gets larger).
  • Even if all session tokens use the same structure, there is still a lot of diversity in where these tokens might be found. (Authorization header, various cookies, query parameters...) Cookie parsing can be awkward in some layers of the infrastructure (e.g. Varnish).
  • If a request contains multiple session tokens (e.g. both an OAuth header and a session cookie), it's hard to ensure that MediaWiki and other parts of the infrastructure agree on how to prioritize them.
  • Coupled with MediaWiki session handling logic. Future changes to session handling, new mechanisms etc. will need to take infrastructure requirements into account.
  • Session token size increases significantly, which means requests get larger.
  • Some disruption to users as old tokens become invalid. Some of these tokens (notably user token cookies and OAuth 1 access tokens) are meant to be long-lived.

Turn some session tokens into JWTs, deprecate the rest

Like the previous option, but only for a small number of recommended authentication mechanisms; probably OAuth 2 and one session cookie. (Which could be one of the existing cookies, e.g. the CentralAuth user token cookie, but it's probably better to just define a new cookie and to make sure every cookie-based session provider outputs and verifies it. Eventually that cookie might replace other session mechanisms: T354910: Create simple MediaWiki session handler for remote login) Everything else gets treated as unauthenticated, in terms of rate limiting.

Paired with a migration effort of important tools from non-recommended authentication mechanisms (OAuth 1 and bot passwords).

Strengths:

  • Preserves some of the benefits of the previous proposal: decentralized, performant.
  • Small amount of work - OAuth 2 already uses JWTs, user token cookies are arbitrary and we might want to change them anyway for security reasons (see e.g. T209586: Make the stored session id be a hash of the used session id to isolate them or the introduction of a HMAC step in User::getToken()).
  • The migration effort would be somewhat valuable on its own. OAuth 1 is a very outdated protocol, few other sites use it. It doesn't have an equivalent of OIDC so we are maintaining a homegrown OIDC-on-OAuth-1 protocol, which is not ideal.

Weaknesses:

  • Still coupled with MediaWiki session logic
  • More confusing rate-limiting behavior from a user point of view
  • Does not really help with decoupling session creation and session validation, which would be the main motivation for the previous option
  • The migration effort might be significant - most of the community tooling uses OAuth 1, and bot passwords were created specifically to avoid having to change the code of old unmaintained bots.

Event Timeline

Some thoughts on implementation:

  • Session info endpoint: this is just an API handler or PHP entry point that calls WebRequest::getSession() and puts its various properties into the response. Can be done in days if not hours.
  • Turn all (or some) session tokens into JWTs:
    • The OAuth 2 access token is already a JWT.
    • For OAuth 1, I think there are two options:
      • Just put JWTs in the oaac_access_token table. Needs a schema change (current column is too short) and probably some sort of migration to convert the existing values.
      • Keep storing short random strings in the table, generate JWTs dynamically. This would 1) require the encryption to be deterministic, 2) would mean the access token gets invalidated any time the JWT changes because some information included in it changes (e.g. the user is added to a new group). We probably don't want to go there.
    • CentralAuth tokens and NetworkSession tokens are arbitrary and fully under our control, turning them into JWTs should be unproblematic. (For NetworkSession it's probably not really needed since it uses the service mesh so it won't pass through the API gateway. But then, it's trivial to do.)
    • For the various cookie-based schemes, the easiest approach would be to just have add a new cookie on top of the existing ones, and put the JWT in it.
    • If we want to simplify cookies, there are two ways, since there are two different groups of cookies which are both needed, one to authenticate the user in a way that keeps working when visiting other wikis, the other to identify the data in the session backend that the current wiki is associating with the user. So we could:
      • Have a JWT cookie on the parent domain (ie. replace current centralauth_Session and centralauth_Token with a JWT), store the local session ID as a separate cookie. So the same JWT cookie would be shared across different wikis, and it wouldn't change much. (Would we even need a central session backend after this? Maybe not.)
      • Replace the local session ID with a JWT, keep the centralauth_* cookies. In some cases, this JWT could store session data directly, per T394076: Investigate storing anonymous sessions client-side.
      • I guess we could do both of these things at the same time too, not sure if there would be much benefit in it though.

Should probably take a look at T393212: Investigate device bound sessions which also involves JWT session cookies.

Vgutierrez subscribed.

Turn some session tokens into JWTs, deprecate the rest (Option 3) is the most promising from a CDN standpoint:

  • JWTs can be verified directly at the CDN edge (e.g. in HAProxy) using only a public key, enabling immediate throttling decisions.
  • Requests lacking JWTs can be classified and rate-limited more aggressively, giving us control over unidentified or legacy traffic while providing a migration path for trusted clients.

We rely on being able to identify authenticated requests at the CDN to apply certain optimizations to endpoints that serve cacheable data to logged-in users. We should ensure this mechanism remains intact, otherwise, we risk degrading cache hit rates for authenticated traffic.

Given haproxy can, as @Vgutierrez pointed out, natively decode and verify JWTs, we could in theory validate a JWT at the edge natively and quickly and reject invalid or malformed ones.

I would thus go with option 2 or 3. I don't have strong opinions on which route to take. I would assume reducing the number of tokens could be desirable for simplicity at all layers, so I have a slight preference for option 3.

One thing we'll need to understand, in relation to this, is how we want to treat expired/forged session tokens at the edge: just ignore them/remove them from the request, or let expired ones get to the backend and outright reject the forged ones?

While I don't think that should be part of this task or this decision, it will depend a lot on what implementation route we take.

I tried to summarize the amount of work involved in supporting various token types, and how much they can be expected to be up-to-date.

session token typeexpirycan be refresheddifficulty of converting to JWTdifficulty of keeping up to datenotes
session ID cookie30 daysyeseasy (just output a new cookie instead / alongside)easy (just update the cookie)
bot password session ID cookiebrowsing sessionyeseasyprobably easy?mostly identical to normal session ID, but clients might be less reliable in honoring Set-Cookie headers
user token cookie1 yearyeseasyeasyonly used on non-SUL wikis; can share a JWT with the central session ID cookie
central session ID cookie30 daysyeseasyeasyon eTLD+1; probably don't want to convert both this and the normal session ID?
central user token cookie1 yearyeseasyeasyon eTLD+1; can share a JWT with the central session ID cookie
CentralAuth API tokensecondsnoeasyeasy
OAuth 1 access token (normal)nevernohardhardannoying to handle on the edge due to protocol being unhelpfully flexible. Protocol doesn't support refresh; most clients are interactive and can just send the user to reauthenticate.
OAuth 1 access token (owner-only)nevernohardimpossibleuser expectation is that it just works forever
OAuth 2 access token (normal)hourssort ofalready a JWTeasyprotocol supports automatic token refresh; probably not all clients support it. Doesn't really matter due to short expiry.
OAuth 2 access token (owner-only)nevernoalready a JWTimpossible?protocol supports automatic token refresh but clients very unlikely to support it. User expectation is that it just works forever.
NetworkSession tokennevernoeasyhard?token managed by us via config file. Requests happen via service mesh so probably we don't really care about this token type.

My first stab at how the roadmap for option 3 would look:

  1. OAuth 2 normal. It already uses JWTs and is already integrated with envoy so not much to do, we just need to agree on the JWT data structure. JWT consumers in the infrastructure accept that the data in the JWT might be outdated (but the expiry time for these is 4 hours so no big deal).
  2. OAuth 2 owner-only. Same as above, except these are meant to be valid forever. JWT consumers accept that; we document that users need to manually refresh their tokens if their throttling class changes (get promoted to admin, become an Enterprise customer etc). Dealing with abuse would require an explicit deny-list checked at the edge.
  3. CookieSessionProvider. Create a new JWT cookie (<wiki>Jwt?) that contains the session ID and user token. SessionProvider::persistSession() outputs the cookie, provideSessionInfo() validates the JWT and its contents (session ID, user token - at first just check that they match the other cookies). refreshSessionInfo() checks relevant JWT data (e.g. throttling class based on user groups) against internally cached user data so the cookie gets updated on change.
  4. (maybe) NetworkSettings. Do we care at all? Requests are made via the service mesh and there's no reason to ratelimit them. If we do care, make a maintenance script for generating JWTs. JWT consumers accept that the token doesn't automatically refresh (these are system users so shouldn't be a problem); can be manually refreshed with a PrivateSettings.php change when needed.

Everything else (CentralAuth token, OAuth 1, browser requests with a CentralAuth cookie but no normal cookie) is considered deprecated and treated identically to fully anonymous requests by the infrastructure. We can do these as follow-up work but don't commit to it for now. CA tokens and browser requests with CA cookies only are rare; OAuth 1 is going to be a pain point (most of our community tooling uses it) but a lot of it is from Toolforge so going to be rate-limited generously.

Work needed for OAuth 2 is something like 1-2 weeks (we probably want a nice hook system so that other extensions / WMF config can inject extra JWT data). NetworkSettings is <1 week (if we need it at all). The cookie stuff, maybe a month?

JWT consumers would be required to interpret two kinds of tokens:

  • Authorization: [Bearer|NetworkSession] <JWT> headers
  • <wiki>Jwt cookies (where <wiki> is the database ID; would have to figure that out from the Host header)

If the latter is problematic, we can JWT-ize the central session cookie instead of the local one (then the cookie name is always centralauth_jwt) but there are various drawbacks to that, I think.

I would assume reducing the number of tokens could be desirable for simplicity at all layers, so I have a slight preference for option 3.

The big annoyance on the infra side would be OAuth 1, if we want to support everything that exists today. The protocol allows all kinds of places to put the access token:

  • an oauth_token URL query parameter
  • the same parameter in the Authorization header, which needs to be parsed as x-www-form-urlencoded
  • the same parameter in the POST body, which needs to be parsed as x-www-form-urlencoded

One thing we'll need to understand, in relation to this, is how we want to treat expired/forged session tokens at the edge: just ignore them/remove them from the request, or let expired ones get to the backend and outright reject the forged ones?

Ideally there would be a soft-expiry (after which the token is rejected/cleared/refreshed/whatever at the endpoint) and a more generous hard-expiry which is put in the JWT; requests with forged or hard-expired tokens would be rejected at the edge.

In practice if we want to migrate things that are currently not JWT (like OAuth 1 access tokens or CentralAuth tokens) there needs to be a long transition period during which invalid tokens are ignored at the edge (where we could only give crude errors while at the endpoint they can be formatted appropriately etc).

For the minimal roadmap laid out above, OAuth 2 is already valid JWT (though anything at the edge that inspects the JWT data would have to be careful not to assume the presence of any field not already in it), the cookie would have a name that differentiates it from existing cookies, and NetworkSession tokens are fully under our control, so it would be fine to reject bad tokens from the start.

I would thus go with option 2 or 3.

Do we want something like a decision record for this, or not worth the effort?

My first stab at how the roadmap for option 3 would look:

tl;dr version:

  • OAuth 2 will keep working as it is (clients need to put an access token in the Authorization header, the token is an encrypted JWT with issuance timestamp, user ID, rate limiting information etc). There is no guarantee for anything other than that the information in it was correct at the time of issuance. When this is detrimental to the user (e.g. they have moved to a more permissive rate limiting class since then), it's on them to obtain a new access token. Non-MediaWiki WMF infra has to be appropriately careful when relying on the token. Dealing with abuse (someone obtains a valid access token and then uses it for DDoS or something) would require an explicit deny-list (e.g. by user ID) checked at the edge.
  • Browser requests with session cookies will include an extra cookie (for now, doesn't replace the existing cookies, just gets added alongside them) called <wiki>Jwt which has identical structure to OAuth access tokens, but a short expiry, so data contained in it isn't more outdated than a few hours or so.
  • Every other session token is opaque and non-verifiable, with the assumption that they will get a low rate limit which will be fine for those use cases:
    • NetworkSession is only used for requests going through the service mesh so never gets rate limited;
    • OAuth 1 will be designated as a legacy protocol, and users needing a higher rate limit will be encouraged to switch to OAuth 2 (presumably with some quota allowance for requests coming from Wikimedia Cloud IPs, to make this less disruptive)
    • The rest (CentralAuth token and some edge cases for browser session cookies) is used very rarely so rate limiting won't be an issue.

The plan looks good, my only question is: will we have multiple JWTs potentially? I'm worried it would generate cookies that are too large to handle.

Also, having so many different cookie names would be difficult-ish to verify at the edge, to be honest.

will we have multiple JWTs potentially?

Hm, normally not but for auth.wikimedia.org that could be a problem.

Also, having so many different cookie names would be difficult-ish to verify at the edge, to be honest.

You can get the cookie prefix name (DB name) from the Host header but there are all kinds of special cases (the auth domain, wikis where the language code was changed, wikis where for historical reasons the host name and the DB name are unrelated etc) and I imagine duplicating that in multiple infrastructure layers would be painful.

I can think of two approaches:

  • Use the core cookie handling (so the JWT cookie would match the session cookie) but use the same cookie name everywhere (say session_jwt). We prefix session cookies with the DB name, but TBH I am not really sure why. MW core does it to support multiple wikis on the same domain, but Wikimedia production didn't really have a use case for it in the decade or so between the decommissioning of secure.wikimedia.org and the setting up of auth.wikimedia.org. Maybe it was just lack of initiative. Anyway, we do have auth.wikimedia.org now, but could just rely on cookie paths there. That would make the cookie name predictable and the browser would only send one JWT cookie at a time. (It would also improve the handling of the current session cookies, although those are small so it matters less.)
  • Use CentralAuth cookies, ie. have something like centralauth_jwt that's set on the second-level domain and is shared between wikis. That would limit what information can be held in the JWT (since it would have to be wiki-independent) but that's probably fine. We'd still need something for non-CentralAuth wikis though, so twice as much work (which probably still isn't much work).
JTweed-WMF renamed this task from [WE5.5.3] Decide how to expose session information to infrastructure layers in front of MediaWiki to Decide how to expose session information outside of MediaWiki.Jul 10 2025, 1:46 PM
JTweed-WMF added a project: FY2025-26 KR 5.1.

Anyway, we do have auth.wikimedia.org now, but could just rely on cookie paths there.

That would mean the different wikis have separate sessions on auth.wikimedia.org. I'm not quite sure if that would be better or worse than the status quo, but at a minimum it's a risky change, so I'd not do it as long as we have another option that doesn't seem any worse.

So I'll provisionally go with the CentralAuth option (ie. CookieSessionProvider sets a JWT on session_jwt or some other non-prefixed cookie name; CentralAuth modifies the cookie to be on the parent domain); might reassess when actually writing the code for it.