Page MenuHomePhabricator

Standardize handling of remote / external users in MediaWiki
Open, Needs TriagePublic

Description

Given a user ExampleUser who does something on remotewiki and this affects localwiki, MediaWiki has five ways of representing the user on localwiki:

  1. With ExampleUser's local user account on localwiki, if it exists and can be connected to their account on remotewiki (e.g. the wikis are part of the same wiki farm which uses some sort of shared login system); ie. with a UserIdentity with some non-zero user ID (and probably a username of ExampleUser although it's technically possible to make a shared login system which attaches differently named accounts to each other).
  2. With an "interwiki" account, ie. a UserIdentity with ID 0 and username remote>ExampleUser, where the remote prefix is the interwiki prefix referencing remotewiki (which is usually another wiki in the same wiki farm, although this is not technically required, e.g. the prefix can be manually specified when importing an XML dump where the source can be outside the wiki farm). This can get somewhat complex because there is no guarantee any wiki in a farm can refer to any other wiki via interwiki prefix – e.g. on the Wikimedia farm, enwikisource has no interwiki prefix for dewiki and would use w:de>ExampleUser, where w is the interwiki prefix for enwiki, and de is enwiki's interwiki prefix for dewiki (which enwikisource doesn't know about – its own de interwiki prefix points to dewikisource). There is some level of support for this kind of "stacked" interwiki prefix, see e.g. ExternalUserNames::getUserLinkTitle().
  3. With a wiki ID based external name, ie. a UserIdentity with ID 0 and username remotewiki>ExampleUser where remotewiki is the wiki ID.
  4. With the SUL name, ie. a UserIdentity with ID 0 and username ExampleUser.
  5. With a numeric central ID, provided by CentralIdLookup.

I think #1 is used fairly consistently when possible, but the other options are a bit of a mess. #2 is used for content import (where there isn't necessarily any shared authentication system connecting the two wikis, and the source wiki might not even be run by the same operator as the target wiki), #3 is used e.g. by GlobalBlocking, #4 is used within the authentication framework, #5 is used by some components/extensions which want to rely on a shared user database, like BotPasswords or OAuth. The expressive power is different as well: #2 isn't necessarily able to reference any other wiki in the wikifarm (and if it can, often only with the "stacked interwiki prefix" syntax that's hard to introspect or programmatically generate) but can refer to outside wikis, #3 can refer to any other wiki in the farm in a way that's easy to convert to a cross-wiki DB lookup, but not outside, #4 doesn't have a concept of a source wiki at all (useful in an authentication context where you might have a shared user registry, which might live outside of MediaWiki), #5 doesn't have the concept of a source wiki and uses numeric IDs instead of strings (useful for DB efficiency, although this functionality predates the actor table and might be obsoleted by it).

Also some of these uses aren't always safe: the interwiki is not always validated and could be anything, possibly conflicting with a wiki ID prefix; CentralIdLookup has a concept of two accounts being the same (or more precisely, the local account being attached to the central account) but the prefix-based usernames do not, they usually just assume that if it's the same username then it's the same user across different wikis which might or might not actually be true.

It would be great to have one canonical way to refer to external users, which would not be a string or integer but an object/interface (ExternalUserIdentity or some such) for flexibility, future-proofness and self-documentation; and which could be easily, and with some guarantee of correctness, converted to the other formats to the extent possible.