Back when I reworked GoogleLogin to use the AuthManager framework, and @Anomie asked for feedback, I asked for an interface to manage links from a local wiki account to be external accounts. This task should track this, I think the most important comment from T110294: Update GoogleLogin to use AuthManager is:
Description
Related Objects
Event Timeline
I implemented an OAuth2 based PrimaryAuthenticationProvider, and ran into the same thing.
I like your proposed schema, but I don't think we need the user_account_link_types table. Just leave it as a string in the user_account_links table, that comes directly from the PHP. Maybe if the tables are giant we can denormalize...?
I think ual_external_account_identifier should be a string type for flexibility, e.g. if you store remote usernames instead of user ids. In any case, I would add an extra BLOB column that is not indexed, and can be used by the provider to store whatever they want in it, and provide convenient accessors/settors to do the JSON serialization.
The trend lately has been to denormalize such things. We even have MediaWiki\Storage\NameTableStore now to make it easy to handle on the MediaWiki side of things.
I think we'd rather not wait for tables to get giant before we start to denormalize strings like this, although the DBAs have the final decision on that.
DBA: I don't know that we'd actually use these proposed tables in WMF production, since it would depend on us enabling an extension that uses it. If we did, the ualt_type would likely be a short string (e.g. "google" or "GoogleLogin"), with one distinct value per enabled extension using the proposed tables. The user_account_links table would have one row for each linked account (e.g. if 50000 users each link 2 external accounts, that would be 100000 rows), and would somehow be centralized for our SUL wikis.
+1 I would prefer denormalize things from the start, so we don't have to worry about this and we don't have to rush to denormalize things if it gets big enough, as doing it when we already have stuff in production is normally harder (not to mention that we'd need to get someone to work back on this that might have some other priorities at the time)
DBA: I don't know that we'd actually use these proposed tables in WMF production, since it would depend on us enabling an extension that uses it. If we did, the ualt_type would likely be a short string (e.g. "google" or "GoogleLogin"), with one distinct value per enabled extension using the proposed tables. The user_account_links table would have one row for each linked account (e.g. if 50000 users each link 2 external accounts, that would be 100000 rows), and would somehow be centralized for our SUL wikis.
While I don't see 100k rows as too problematic, I would prefer if we go for the safest option now that we are sort of defining the data model rather than changing it when it is in production.
I rather have more small tables than less tables but bigger.
One thing I'd like to see (and maybe it can happen as part of T215046: RfC: Use Github login for mediawiki.org) is an abstract base class for external providers where you only need to plug in the specifics of how to query that provider. So if it is possible to standardize how external auth data is stored, that would be nice. I'm skeptical though - what are the chances that we want to store the exact same set of data for all such providers? Or maybe we should just assume that such data never needs to be searched and the DB is basically just needed as a key-value store, and store data in some serialized format?