Page MenuHomePhabricator

Interface to manage account links to external sites
Open, Needs TriagePublic

Description

Back when I reworked GoogleLogin to use the AuthManager framework, and @Anomie asked for feedback, I asked for an interface to manage links from a local wiki account to be external accounts. This task should track this, I think the most important comment from T110294: Update GoogleLogin to use AuthManager is:

  • An extension still has to handle a "connection" between the local wiki account and the identifier of the external site (GoogleLogin uses the user_google_user table for that). However, it would be cool to have a central table for it, and an interface that allows an extension to specify the type of the connection (e.g. google_auth) and the interface handles the connection between multiple external site account connections and the one local wiki account (the same what GoogleLogin does with the GoogleUser class and the user_google_user table, but more global and better thought). What do you think about that? It would make it easier for Linkproviders to create/manage links between accounts :)

That table would need to be something like

CREATE TABLE /*_*/user_account_links (
    ual_id int unsigned NOT NULL PRIMARY KEY AUTO_INCREMENT,
    ual_user int unsigned NOT NULL, -- user_id or an ID from CentralIdLookup, same for all types.
    ual_type int unsigned NOT NULL, -- FK to ualt_id
    ual_external_account_identifier ??? NOT NULL
);
CREATE INDEX /*i*/ual_user_type ON /*_*/user_account_link (ual_user,ual_type);
CREATE UNIQUE INDEX /*i*/ual_external_account ON /*_*/user_account_link (ual_type,ual_external_account_identifier);

CREATE TABLE /*_*/user_account_link_types (
    ualt_id int unsigned NOT NULL PRIMARY KEY AUTO_INCREMENT,
    ualt_type varchar(?) NOT NULL
);
CREATE UNIQUE INDEX /*i*/ualt_type ON /*_*/user_account_link_types (ualt_type);

The troublesome bit is that ual_external_account_identifier field. Is it a string (e.g. an email address), or an integer, or a GUID, or some JSON blob with multiple fields? How big does the field need to be? Can we sensibly apply the shown uniqueness constraint on it to help prevent multiple accounts linked to the same external account?

Event Timeline

I implemented an OAuth2 based PrimaryAuthenticationProvider, and ran into the same thing.

I like your proposed schema, but I don't think we need the user_account_link_types table. Just leave it as a string in the user_account_links table, that comes directly from the PHP. Maybe if the tables are giant we can denormalize...?

I think ual_external_account_identifier should be a string type for flexibility, e.g. if you store remote usernames instead of user ids. In any case, I would add an extra BLOB column that is not indexed, and can be used by the provider to store whatever they want in it, and provide convenient accessors/settors to do the JSON serialization.

I like your proposed schema, but I don't think we need the user_account_link_types table. Just leave it as a string in the user_account_links table, that comes directly from the PHP. Maybe if the tables are giant we can denormalize...?

The trend lately has been to denormalize such things. We even have MediaWiki\Storage\NameTableStore now to make it easy to handle on the MediaWiki side of things.

I think we'd rather not wait for tables to get giant before we start to denormalize strings like this, although the DBAs have the final decision on that.

DBA: I don't know that we'd actually use these proposed tables in WMF production, since it would depend on us enabling an extension that uses it. If we did, the ualt_type would likely be a short string (e.g. "google" or "GoogleLogin"), with one distinct value per enabled extension using the proposed tables. The user_account_links table would have one row for each linked account (e.g. if 50000 users each link 2 external accounts, that would be 100000 rows), and would somehow be centralized for our SUL wikis.

I like your proposed schema, but I don't think we need the user_account_link_types table. Just leave it as a string in the user_account_links table, that comes directly from the PHP. Maybe if the tables are giant we can denormalize...?

The trend lately has been to denormalize such things. We even have MediaWiki\Storage\NameTableStore now to make it easy to handle on the MediaWiki side of things.

I think we'd rather not wait for tables to get giant before we start to denormalize strings like this, although the DBAs have the final decision on that.

+1 I would prefer denormalize things from the start, so we don't have to worry about this and we don't have to rush to denormalize things if it gets big enough, as doing it when we already have stuff in production is normally harder (not to mention that we'd need to get someone to work back on this that might have some other priorities at the time)

DBA: I don't know that we'd actually use these proposed tables in WMF production, since it would depend on us enabling an extension that uses it. If we did, the ualt_type would likely be a short string (e.g. "google" or "GoogleLogin"), with one distinct value per enabled extension using the proposed tables. The user_account_links table would have one row for each linked account (e.g. if 50000 users each link 2 external accounts, that would be 100000 rows), and would somehow be centralized for our SUL wikis.

While I don't see 100k rows as too problematic, I would prefer if we go for the safest option now that we are sort of defining the data model rather than changing it when it is in production.
I rather have more small tables than less tables but bigger.

One thing I'd like to see (and maybe it can happen as part of T215046: RfC: Use Github login for mediawiki.org) is an abstract base class for external providers where you only need to plug in the specifics of how to query that provider. So if it is possible to standardize how external auth data is stored, that would be nice. I'm skeptical though - what are the chances that we want to store the exact same set of data for all such providers? Or maybe we should just assume that such data never needs to be searched and the DB is basically just needed as a key-value store, and store data in some serialized format?

LSobanski subscribed.

Removing the DBA tag and subscribing myself instead. Once there are specific actions for DBA please re-add us and/or @mention me.