Page MenuHomePhabricator

Gradually isolate mediawiki authentication code and infrastructure
Open, Needs TriagePublic

Description

Virtually everyone agrees that the status quo of auth(n) code and infra not being isolated from the rest of mediawiki (+3M LOC) is suboptimal in terms of security. We should really reduce the attack surface. Thanks to SUL3, now login mostly happens on auth.wikimedia.org domain. This makes isolation much easier and illuminates what to do next.

Here is my draft brain dump of changing the floor of disco while people are dancing:

  • Short term:
    • Set up a dedicated k8s namespace, let's call it mw-restricted and redirect auth.wikimedia.org traffic to it.
    • Fork helm charts and strip out anything that doesn't make sense. Specially there shouldn't be a way to communicate between mw-restricted and shellbox
    • Set an env var like MW_RESTRICTED in those containers
    • Stop loading almost all extensions when the env var is set, we don't need the extension that produces hieroglyph images or musical notations in Special:Login.
      • Allowed extensions: CentralAuth, OATHAuth, EmailAuth, ConfirmEdit (more? Abusefilter?).
      • Going forward, any new extension requiring to be loaded in restricted mode should have higher security risk in security readiness review.
    • Set up a new set of dedicated CI jobs similar to production and maybe even have browser tests for login to make sure nothing breaks if we remove code from being loaded in restricted mode.
    • Maybe fork the mw images and remove unneeded dependencies too, for example php8.1-wikidiff2
    • Fork vendor/, a new repo like vendor-restricted/ and just remove everything not needed.
    • Fork core's autoload.php to like autoload-restricted.php and only load classes that are needed during auth(n). It could be as simple as denoting such classes with @allow-in-restricted-mode and then letting generateLocalAutoload.php take care of the rest. Finding them shouldn't be hard, we can use arclamp logs.
    • Refactor as much as possible to remove classes from the restricted mode
  • Medium term: After we are sure all pieces that deal with authentication are done in auth.wikimedia.org. Most notably: Login via API, change password, and entering "secure mode"
    • Move gu_password to a dedicated table.
    • Make sure user_password is never written or read. Clean it up.
    • Split the gu_password table into a dedicated cluster, give it a dedicated user and password and only set it in restricted mode (complexity: restricted still needs to read from normal tables, we probably need two db users or allowing the restricted db user have access to centralauth tables)
    • Unset $wmgPasswordSecretKey in non-restricted mode and even better, trigger an exception if it's accessed (I don't know if it's possible).

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Stop loading almost all extensions when the env var is set, we don't need the extension that produces hieroglyph images or musical notations in Special:Login.

This was tried in T373737 and didn't work.

Stop loading almost all extensions when the env var is set, we don't need the extension that produces hieroglyph images or musical notations in Special:Login.

This was tried in T373737 and didn't work.

As an alternative, auth.wikimedia.org can use a dedicated built app instead of MediaWiki. See T120484#10724394

Aklapper renamed this task from Gradually isolate mediawiki authenication code and infrastructure to Gradually isolate mediawiki authentication code and infrastructure.Apr 13 2025, 10:11 PM

If I was trying to pivot, one thing I thing I'd try to do would be to try and write something to db or cache that might get executed. e.g. Anything still using php unserialize() or Mustache templates. So one thing that might make sense here is to set a different $wgSecretKey between auth and normal (for mustache) [or making a new var just for that], and making sure every instance of unserialize() uses the second argument to limit class types.

Split the gu_password table into a dedicated cluster, give it a dedicated user and password and only set it in restricted mode (complexity: restricted still needs to read from normal tables, we probably need two db users or allowing the restricted db user have access to centralauth tables)

I think this would likely have an outsized impact, just because it totally eliminates the possibility of sql-injection, and if you find an sql-injection, it is such an easy thing to exploit. [I suppose that's less of a concern with encrypted passwords].

Also once globalusers table is secure, the per wiki user_token probably becomes the next most likely target (I'm assuming its still used, my knowledge might be a bit out of date on how modern central auth works). It might be worth thinking about securing that as well.

At the very least, setting $wgAuthenticationTokenVersion (or a new equivalent variable) to some secret, so that the Token cookie ends up being a mix of both a database secret (user_token) and an in-memory secret instead of just an in-database secret, might be prudent defense against sql-injection style leaks of the user_token field.

Unset $wmgPasswordSecretKey in non-restricted mode and even better, trigger an exception if it's accessed (I don't know if it's possible).

Ideally it would not just be unset but not be in non-restricted containers at all. I think in a threat scenario where an attacker can read arbitrary variables, its reasonable to conclude they can probably read the raw contents of the config files too.

Thanks to SUL3, now login mostly happens on auth.wikimedia.org domain.

API login still happens via the regular URLs. (You can use auth.wikimedia.org, but we haven't made any effort to get clients to do it. For some, e.g. the apps, it would take considerable time. Some are unmaintained etc.)

change password, and entering "secure mode"

These are now redirected to auth.wikimedia.org automatically. That can be overriden with usesul3=0 which I think we want to keep for a couple months at least, just to make bug investigations easier (was it caused by SUL3 or not?).
The one exception is WebAuthn where the old URL for the management interface is still in use for now, since only there can you disable old passkeys. See T376021: Migrate WebAuthn on Wikimedia wikis to central domain. We'll probably remove that after a month or so.

The short-term and medium-term proposal seems swapped to me, in terms of effort. More importantly, not sure how much actual security improvement any of these changes provide.

Separating passwords seems like a straightforward change (with significant wall time, but not a ton of development time) - create a password getter/setter service (in the MediaWikiServices sense), replace the ten or so instances in the code where the password is read from / written to the DB, create a migration version of the service which uses two tables, do the migration. That's T183420: Authentication data should not be available through the normal DB abstraction layer. It prevents password exfiltration via SQL injection (unless the vulnerable query is the one going to the password table). Also it's not Wikimedia-specific, and could easily be made default in the future for new installations, which is nice. But:

  • Passwords are encrypted, so actually SQL injection in itself already doesn't work. An attacker with shell access (which is needed to get the encryption key) can probably access passwords / DB credentials more directly and doesn't need to rely on SQL injection.
  • Lots of other vaguely password-equivalent things are stored in the DB. (User tokens, TOTP secrets, temporary passwords, bot passwords. An attacker with write access could also e.g. change the email for a takeover.)

Restricting what code can be accessed on auth.wikimedia.org seems like a major ongoing maintenance burden. It would make it harder to read/write cookies or perform XSS for that domain (relative to other domains). But there isn't that much difference security-wise in which domain an attacker can access.

Full isolation would require rebuilding the authentication stack entirely outside MediaWiki. That's pretty complex. It would have to include

  • passwords
  • user tokens
  • temporary passwords
  • bot passwords
  • OAuth access tokens
  • maybe CentralAuth tokens (although they are so short-lived they are hard to abuse)
  • email changes
  • sending out password reset emails

A past, very partial proposal to build such a service was T140813: Protect sensitive user-related information with a UserData / auth / session service. A (sadly underdocumented) attempt to use a third-party service was T304600: Authentication changes.

We don't have a task for that, but I think if we wanted to build something like that, it would have to be an app with its own UI for authentication workflows, rather than a web API.

I think it could roughly go like this:

  • Get rid of the remaining non-central-domain authentication workflows, as proposed in the task description. That would involve migrating bot passwords, OAuth, email changes to the central domain, and a big migration project for API clients.
  • Change the login UI to be less like MediaWiki, and much more minimal. (We considered that for SUL3: T367912: Explore a popup window as the default login flow in MediaWiki under SUL3 / T367913: Explore an iframe overlay as the default login flow in MediaWiki under SUL3 but it was rejected on account of trying to change too many things at once. On its own, I think it would be a fine change. Most of the background work for it did get done: T362706 / T364939.) Do not allow i18n customizations, extensions modifying the signup UI etc.
  • Come up with a session scheme in which MediaWiki can verify but not create valid sessions (probably some form of public cryptography, e.g. JWT in a cookie), have it replace the current CookeSessionProvider/CentralAuthSessionProvider. I think this would tie in nicely with the next big work item for MediaWiki Platform, which is making MediaWiki sessions somehow compatible with or interpretable by the traffic layer.
  • Come up with a mechanism where all the less security-critical MediaWiki authentication logic (AbuseFilter, spam blacklist, etc etc etc) can intervene in the login on the central domain, without actually being part of the central domain PHP process. Maybe the central domain login process would make an internal API call to MediaWiki.
  • Once all that is done, you "only" need to rewrite the authentication stack as a standalone app. Or maybe a very very locked down standalone MediaWiki instance, as suggested in the task description (much more awkward, but also much less effort).

I think "Gradually isolate mediawiki authentication code and infrastructure" makes a good sub-epic of T122375: Segment sensitive data within WMF cluster (tracking), but the things listed in the task description are just very specific subsets of the problem and / or specific implementation ideas. @Ladsgroup what do you think about keeping this task as a high-level problem description and turning those into separate tasks?

I think "Gradually isolate mediawiki authentication code and infrastructure" makes a good sub-epic of T122375: Segment sensitive data within WMF cluster (tracking), but the things listed in the task description are just very specific subsets of the problem and / or specific implementation ideas. @Ladsgroup what do you think about keeping this task as a high-level problem description and turning those into separate tasks?

Sure! I'd be grateful