Page MenuHomePhabricator

Authentication data should not be available through the normal DB abstraction layer
Open, MediumPublic

Description

Currently, sensitive authentication data (passwords, user tokens, 2FA secrets etc) is stored the same way any other data is, which makes it vulnerable to stealing via injection attacks. This is a known problem (T122375: Segment sensitive data within WMF cluster (tracking)); the plan used to be move it into a separate service (T140813 / T120484), but that got deprioritized, is not on the roadmap anymore, and IMO has a bad cost/benefit ratio anyway (also does not do anything about fixing the issue in standalone MediaWiki).

Instead, we could do segmentation on a budget:

  • store all authentication data in a separate table (or set of tables), with a separate DB user to read/write it; make sure the "usual" DB users ($wgDBuser etc) do not get access to it
  • define a separate auth-tables-only loadbalancer service (same as DBLoadBalancerFactory just with different user/password)
  • code that accesses auth data has to use DAO classes; only those classes can access the auth-tables-only loadbalancer service (easy to ensure via CI, or just code review).

That would drastically reduce the security review surface for SQL injection, to the point of practically eliminating that threat. It wouldn't offer much defense against remote code execution, but that's not really feasible to defend against, anyways.

Event Timeline

If you want to generically store "authentication data", then don't try to make a "separate auth-tables-only loadbalancer service" and "code that accesses auth data has to use DAO classes". Make an interface specifically for accessing authentication data with no reference to SQL or DAO at all. It'll probably wind up looking much like BagOStuff. Then the back-end implementation of that can use a separate DB connection with a separate password, or a bespoke service, or whatever.

If you want to have a "secure" SQL connection that doesn't share access with other SQL connections, you still don't need a whole separate DBLoadBalancerFactory. Just make use of getExternalLB() which already has the ability to be configured to connect with different credentials per "cluster". I doubt requiring DAO on top of it would really gain you a whole lot; you'd probably be better served by having each extension use its own "cluster" even if all those clusters just log in to the same database with different credentials.

If you want to generically store "authentication data", then don't try to make a "separate auth-tables-only loadbalancer service" and "code that accesses auth data has to use DAO classes".

It is a generic need in the sense that there are many places where it is applicable. (Core user tokens, core passwords, CentralAuth user tokens, CentralAuth passwords, OATH secrets, maybe bot passwords, maybe OAuth secrets; also anyone could write their own extension that needs it.) It is not generic in the sense that we necessarily want to abstract away differences between those places, IMO.

Make an interface specifically for accessing authentication data with no reference to SQL or DAO at all. It'll probably wind up looking much like BagOStuff. Then the back-end implementation of that can use a separate DB connection with a separate password, or a bespoke service, or whatever.

That does not seem like a useful abstraction. Why would we want to use anything other than SQL? That's where MediaWiki stores all its primary data.
It would also mean all kinds of data would be meshed together in the same table which is a problem when doing things like mass token resets where speed is critical and you might want to operate on the DB level.

If you want to have a "secure" SQL connection that doesn't share access with other SQL connections, you still don't need a whole separate DBLoadBalancerFactory. Just make use of getExternalLB() which already has the ability to be configured to connect with different credentials per "cluster".

Thanks, wasn't aware of that.

I doubt requiring DAO on top of it would really gain you a whole lot; you'd probably be better served by having each extension use its own "cluster" even if all those clusters just log in to the same database with different credentials.

It would gain easy code review as all code at risk of having SQL injection vulnerabilities (for a given DB column) would be in a single place.

MariaDB supports per-column privileges so it would even be doable without a schema change.

I guess the scariest aspect would be that auth data changes cannot use the implicit transaction of the request. That's already the case with CentralAuth, but unless passwords/tokens are moved to a separate table, registration would be messy as the user record would have to be updated before the implicit commit that would normally create it.

It is a generic need in the sense that there are many places where it is applicable. (Core user tokens, core passwords, CentralAuth user tokens, CentralAuth passwords, OATH secrets, maybe bot passwords, maybe OAuth secrets; also anyone could write their own extension that needs it.) It is not generic in the sense that we necessarily want to abstract away differences between those places, IMO.

What differences?

That does not seem like a useful abstraction. Why would we want to use anything other than SQL? That's where MediaWiki stores all its primary data.

Why did T140813 and T120484 exist?

It would also mean all kinds of data would be meshed together in the same table which is a problem when doing things like mass token resets where speed is critical and you might want to operate on the DB level.

Mass token resets are already trivial with $wgAuthenticationTokenVersion.

It would gain easy code review as all code at risk of having SQL injection vulnerabilities (for a given DB column) would be in a single place.

It would probably be in a single place anyway. The safety is in not having other DB connections even be able to access it, not in DAO.

What differences?

That they ultimately store very different things (for example token leaks are only a security risk only as long as the tokens are used; password hash leaks are problematic even if the password is not used anymore, so it's important to be able to mass-delete them when e.g. the wiki switches to a different method of authentication.

Why did T140813 and T120484 exist?

To move the entire password handling logic beyond the service barrier so that sensitive data is safe even in the case of a successful remote code execution attack. That requires a completely different interface, though.

Mass token resets are already trivial with $wgAuthenticationTokenVersion.

Usually, but there might be reasons not to do that (e.g. only a small fraction of users is affected, or the DB values of the tokens have leaked).

It would probably be in a single place anyway. The safety is in not having other DB connections even be able to access it, not in DAO.

Other connections not being able to access means there is less code with potential for auth data disclosure via SQL injections, which makes security review easier. Not having to hunt for that code through the 5000-line User.php makes it even more easy.
Anyway, using separate DAO classes or not is not really central to the idea and could be a follow-up discussion.

chasemp triaged this task as Medium priority.Dec 9 2019, 5:12 PM