Page MenuHomePhabricator

Store CentralAuth password hashes outside the main database cluster
Open, MediumPublic

Assigned To
None
Authored By
csteipp
Dec 5 2015, 12:29 AM
Referenced Files
F4021190: IMG_20160509_135821.jpg
May 16 2016, 10:46 PM
F4021186: IMG_20160509_135739837.jpg
May 16 2016, 10:46 PM
F4021177: IMG_20160509_135728382.jpg
May 16 2016, 10:46 PM
F4021184: IMG_20160509_135712838.jpg
May 16 2016, 10:46 PM
F4021180: IMG_20160509_135736643.jpg
May 16 2016, 10:46 PM
F4021181: IMG_20160509_135721782.jpg
May 16 2016, 10:46 PM
F4021187: IMG_20160509_135828.jpg
May 16 2016, 10:46 PM

Description

Background

MediaWiki has a modular authentication component (AuthManager), for which the default implementation stores account credentials in the local database of that site. The password component is also configurable and flexible, allowing for smooth migration to ever better encryption algos as the industry standard evolves over time (e.g. T216682: Switch WMF production to Argon2 password hashes).

Today, the CentralAuth extension registers an AuthManager implementation that stores account credentials in a separate and centralised database instead.

This database (MySQL/MariaDB) is responsible for storing and retreiving password hashes. Validation of password hashes happens inside the MediaWiki password component, which means MediaWiki requires read access to this database.

Problem statement

If malicious code were to execute on the MediaWiki web servers, any of its databases could be read and leaked, including the separated centralauth database.

To make Wikimedia Foundation sites more resilient to compromise, we CentralAuth should validate submitted login credentials against the database, without the ability to batch-read the hashes themselves from the underlying database.

2015 Proposal

Original task description by @csteipp:

Move all password hashes for CentralAuth accounts out of the main centralauth database and into a database only accessible from a single authentication service.

The service will need to handle,

  • Password authentication
    • by implication, it will need to handle new account creation and password resets too
  • Creating and authenticating temporary / forgotten-password tokens
  • (possibly) tokens
  • (possibly) alerting on anomalous request behavior

The service should store password hashes in a format that is no weaker than they are currently stored in CentralAuth.

The service needs high availability (since it will be used for password logins, and possibly token logins)

Related Objects

Event Timeline

csteipp raised the priority of this task from to Needs Triage.
csteipp updated the task description. (Show Details)
csteipp added subscribers: csteipp, dpatrick.

2-factor authentication would be part of this component?

2-factor authentication would be part of this component?

Two-factor authentication secrets should be stored in a more secure segment of the WMF's cluster (like this task is for password hashes), but this task is specifically about moving the password authentication pieces of CentralAuth into a more secure location. So a single sql injection can't be used to get everyone's password hashes.

Implementing 2FA for CentralAuth wikis in general is a separate task (and what the Security Team is doing next quarter instead of this).

Pictures from our initial whiteboarding of the service, and some considerations for building it.

Overview:
IMG_20160509_135821.jpg (3×4 px, 5 MB)

(checklist of stuff to do) | (potential phases) | (dataflow) | COTS considerations

Checklist of Stuff to do (these should be made into tickets):

IMG_20160509_135739837.jpg (4×3 px, 3 MB)

  • COTS comparison
  • Language choice
  • Data store decision
  • Convert hash formats to strong formats

Phases (potential)

IMG_20160509_135736643.jpg (4×3 px, 2 MB)

Phase 1

  • Password Authentication via REST API
  • Ex:CentralAuth as initial consumer
  • TLS (server authenticated)
  • Swagger spec

Phase 2

  • MediaWiki Core as a consumer (for non SUL wikis as the WMF, 3rd party users)
  • OATH secret storage API
  • mutual TLS authentication

Dataflow diagrams

IMG_20160509_135728382.jpg (3×4 px, 2 MB)

IMG_20160509_135721782.jpg (3×4 px, 2 MB)

IMG_20160509_135828.jpg (3×4 px, 4 MB)

COTS Comparison

IMG_20160509_135712838.jpg (4×3 px, 2 MB)

  • API - simple / REST preferable
  • Hash and encrypt secrets
  • Easy operations with WMF expertise
  • FOSS
  • Multi-DC cluster, data management tools
  • Backups and Disaster Recovery
  • Easy Integration - time to integrate less than time to build
  • Secure transport (encryption, authentication of server, client authentication would be nice)

Notes from a meeting between @dpatrick, @csteipp , @Pchelolo and myself: https://docs.google.com/document/d/1-sNsDhJl1KCqOX9De5uTwFypnBaNOB_XzKo9Al8l4bI/edit

Action items:

  • Gabriel: Check what is needed to support legacy password types in node library
    • Consider using passport library, especially for later steps
  • Gabriel: Talk to ops about storage solutions and hardware
    • MySQL vs. Cassandra
    • 2-3 physical nodes per DC for redundancy and ease of firewalling
  • All: Coordinate via task, and make final call after getting a better handle on overall effort & timelines.

Tokens (user.user_token / globaluser.gu_auth_token) would probably also need to be moved as they are password-equivalent login methods. (see also T50698) Since we are also using them for session invalidation and a service would be too slow for that, that functionality probably needs to be split.

See T140813#2644463 for some considerations about password reset.

Krinkle renamed this task from Create password-authentication service for use by CentralAuth to Store CentralAuth password hashes outside the main database cluster.Apr 8 2025, 5:24 PM
Krinkle updated the task description. (Show Details)
Krinkle subscribed.

Note: the new task description focuses on moving user passwords to a dedicated database cluster, which would be something overlapped (either can be considered a subtask of, or a CentralAuth equalvent of) with T183420: Authentication data should not be available through the normal DB abstraction layer.

The current shared auth domain contains (or will soon contain) three different functionalities, all implemented as part of MediaWiki core or extension:

  • A "session" app which stores user session - local wiki will use it for autologin.
  • A "auth" app to handle password login (and 2FA), similar to https://idp.wikimedia.org/login (note this part is not relevant to temporary users since they have no password).
  • A "credential management" app to handle change of credential (password and 2FA setting).

There is proposal mentioned in T348388: SUL3: Use a dedicated domain for login and account creation (the version before December 6, 2024 is archived at T348388#10555811) that may take a step further than this. That may propose introduce 2-3 new web apps (or one app with 2-3 features) that may completely replace the current CentralAuth-based implementation.

@Bugreporter The task objective has not changed, which is to store passwords outside the reach of the main mediawiki database credentals. Previously, it specified one specific solution (build a new service) on the assumption all parties involved know the problem/motiviation. I've clarified the description to state that problem.

Note: the new task description focuses on moving user passwords to a dedicated database cluster, which would be something overlapped (either can be considered a subtask of, or a CentralAuth equalvent of) with T183420: Authentication data should not be available through the normal DB abstraction layer.

That task is a more modest proposal (could be thought of as the first step), which if implemented would protect against SQL injection in queries unrelated to passwords (since they would use a different DB connection) but not against malicious code execution (since MediaWiki would still have access to the password-specific DB connection).

In hindsight I'm not convinced it's worth doing as a standalone step since the risk of SQL injection is pretty low in the first place (especially with all the changes that were made to the RDBMS abstraction layer since then).

The task objective ... is to store passwords outside the reach of the main mediawiki database credentals.

I think more specifically it was to store them outside the reach of MediaWiki. And

Validation of password hashes happens inside the MediaWiki password component, which means MediaWiki requires read access to this database.

might or might not have been intended.

If you allow read access, the protection against RCE attacks is pretty limited (you can limit the scale, but RCE attacks are not script kiddie level so the attacker is going to be smart, and a smart attacker would probably be going after specific accounts rather than all hashes which aren't terribly useful anyway).

Also, if you only allow read access, the service will have to implement all the mechanisms which involve writing passwords (hash upgrade, password change, registration) and at that point not having it implement hash validation seems like a weird design decision.

Also also, as I noted in T120484#2645877 and T140813#2644463, a service that meaningfully protects against RCE is a really hard problem as there are many workflows where an attacker would have equivalent impact (creating or reading a temporary password, writing or reading a local or central user token or a local or central session ID, being able to create a bot password or an OAuth consumer, being able to change the email address) and those all would have to be isolated.

So IMO the specific problem statement in this task is not that useful.