Page MenuHomePhabricator

Split user table to multiple tables
Open, Needs TriagePublic

Description

(this is only an idea, and actually contains multiple things to do)

Purpose:

FieldNew table
user_idkept
user_namekept
user_real_nameuser_properties (see T306248)
user_passworduser_password
user_newpassworduser_password
user_newpass_timeuser_password
user_emailuser_email
user_touchedkept
user_tokenuser_token (optional, potentially with T65354?)
user_email_authenticateduser_email
user_email_tokenuser_email
user_email_token_expiresuser_email
user_registrationkept
user_editcountuser_editcount
user_password_expiresuser_password
user_is_tempkept

This introduces 4 (or 3) new tables: user_password, user_email, user_editcount and (potentially) user_token.

Note:

  • user_password, user_email should never have rows for temporary accounts and system accounts; user_token should never have rows for system accounts.
  • In Wikimedia projects user_password, user_email (and potentially user_token) table should be empty (but potentially contains some rows for non-SUL accounts for now) since passwords and emails should be managed via CentralAuth
  • user_editcount will only have rows for users with edits (so lacking a row implies an edit count of zero). Since 90% of Wikimedia accounts have no edits this table will be significantly smaller than user table. If we have an index of this table it may also solve T344782: New special page to list users by highest edit count.
  • we need to first migrate local emails to CentralAuth
  • currently CentralAuth does not provides an email address confirmation mechanism, and we need to make one
  • optionally I also propose to split the email and password related fields of CentralAuth globaluser to new tables, to reduce the database footprint of temporary users

Event Timeline

Potentially allows users to have multiple email addresses registered to an account too... Which can give extra recovery options...

I think there's a task somewhere about storing passwords in a different table, if not a completely different database/cluster..

I think there's a task somewhere about storing passwords in a different table, if not a completely different database/cluster..

T120484: Create password-authentication service for use by CentralAuth. In my opinion this can be done in several steps:

  1. Introduce new table to store passwords in MediaWiki instances
  2. Migrate remaining local passwords to CentralAuth so in SUL wiki we have an empty password table
  3. Introduce table to store CentralAuth password
  4. Move CentralAuth password storage table to a new cluster
  5. Create a service to handle CentralAuth password (that task)

I think there's a task somewhere about storing passwords in a different table, if not a completely different database/cluster..

T183420: Authentication data should not be available through the normal DB abstraction layer

Note: Currently isSystemUser() checks "A user is considered to exist as a non-system user if it can authenticate, or has an email set, or has a non-invalid token." (https://phabricator.wikimedia.org/source/mediawiki/browse/master/includes/user/User.php$2273) and resetting a token simply replacing it with another random valid one, not an invalid one (T17294#9558467). In my task description invalid token does not exist anymore (system user will have no token at all, as are expired temporary accounts) so we can not check system users in this way. Instead I propose to check them via a special username format (T214722#8845922).

I really like the idea. We probably need to iron out some details since whatever changes we will be stuck with for a long time. For the start I actually recommend doing user_editcount first since: 1- most users doesn't have edits 2- we should shard per user similar to site_stats to reduce the lock contention caused by bots editing too fast (or cat-a-lot) and 3- this remove large locks on user table since a lot just lock the user row for updating edit count.