Primary: improve recruitment of new developers. You need a wiki account to use Phabricator and report bugs, the support desk is hard to use without a wiki account, you need a wiki account to propose OAuth apps, etc. (And in the future SUL login will hopefully replace even more of our less-user-friendly logins, see T189531: All Wikimedia developer services should use single sign-on.) Most developers who work a little but not much with MediaWiki (and that's a lot more people than who work a lot with MediaWiki) do not have a Wikimedia SUL account; practically all developers today have a Github account. It's not hard to register in MediaWiki, but even a little friction is friction, and it means yet another password to manage, which many users dislike.
Letting you use your existing Github account which you probably use for everything else development-related means less distraction (we can skip the password and captcha, email confirmation, and probably can immediately make the account autoconfirmed if the GitHub user looks established). It also means we can connect Github accounts to wiki accounts, which can be used for a number of interesting things, especially once the linking of Wikimedia LDAP and SUL accounts happens (e.g. pulling Github pull requests into Gerrit while preserving authorship info).
Secondary: have an active but not super high-profile (and consequently not high-risk) use case where we can evaluate external logins. External logins with some major identity provider such as Google are interesting for some of the other Wikimedia wikis for a number of reasons, and also a key feature for third-party MediaWiki installations in a corporate environment; mediawiki.org seems like the ideal place to polish and evaluate MediaWiki's external login provider support as its userbase is good at identifying and reporting workflow or usability imperfections.
Measure of success
Every extension, escpecially an authentication extension, comes with nonzero maintenance cost, security surface and added UX complexity. So we only want to use an additional login method if it actually improves user retention or recruitment.
The easy thing to measure is what fraction of registrations use GitHub, so we should probably set some threshold above which we consider it worthwhile to have. Ideally we'd also measure increase in total number of registrations, but due to seasonal variations and registration numbers fluctuating a lot even under normal conditions, that might not be practical.
Since the AuthManager rewrite, MediaWiki has first-class support for external identity providers. They display a button on the login form, on clicking it you get redirected to the external site, where you authorize sharing your account details with MediaWiki (on GitHub and most modern sites this takes the form of an OAuth handshake), on return you get logged into your linked wiki account; if you already have a wiki account but it's not linked, you can do a normal password login and link it; if you don't have a wiki account at all, you can choose a username and a wiki account linked to the external account is created for you. (There are some UI and workflow bits related to this in AuthManager which are unfinished, but what needs to be done is well understood). Multiple external accounts can be linked to the same wiki account. There is a special page for inspecting and removing linked accounts. If you have used external login to create your account, you can set up a password via the normal password change process.
External login should be thought of as an alternative of the password field; other login/registration checks (throttling, 2FA, blocks / blacklists / edit filters, etc) are applied as they would to a normal login/registration, unless intentionally prevented.
Existing MediaWiki extensions providing Github authentication include OAuth2Github and OAuth2 Client, although both seem dated. Probably the easiest way is to clone GoogleLogin, which is well-maintained and fully utilizes AuthManager, and only differs from Github auth in some implementation details of the OAuth process that both sides use for authentication. Preferably, split GoogleLogin into a generic (or generic OAuth-based) external login support layer and a provider-specific layer, to make the task of future extension developers easier.
Comparison of GitHub with other alternatives
GitHub is by far the largest code forge, with 31M developers (8M of whom joined last year) according to its annual report. It is closed source but mainly focusing on supporting opensource projects. The next largest is BitBucket (6M users, part of the Atlassian suite, not specifically opensource-focused). The largest code forge running on opensource software is GitLab (they don't give a user count but their organization count is 5% of GitHub's so at a wild guess about 1-2M users?). To avoid the NASCAR problem we want to minimize the number of external login providers we use, so the more interesting question would be, what is the extent of overlap between these user bases? (No point in adding GitLab if almost every GitLab user also has a GitHub account.) We don't have a good way of guesstimating that, though, short of maybe enabling providers one by one and seeing the additional impact on registrations.
We might find providing GitHub login unfeasible or not worthwile; GitHub might go out of business or change their practices in ways which make them an unappealing option for us. Thus, we must make sure to avoid lock-in: our users should not depend on GitHub for logging in. That means they either need to provide a password or a valid email address.
The simplest way to solve this is to get their email address from GitHub. That makes registration slightly more complicated (GitHub makes the user click through one permission screen when we just ask for the user ID, and two if we ask extra permissions, even if that's just seeing the email address) but that's still a decent tradeoff. We'd probably have to periodically re-query the email address so we can update it if the user changes it on GitHub.
Alternatively (or additionally), we can ask users to provide a password after a certain amount of activity (similarly to T58028: Show Echo web notification (asking users to consider providing an email) to users who don't have an e-mail address associated with their account).
The security impact is that anyone who controls the GitHub account (be that the legitimate user, someone who stole the account, or GitHub itself) can log into the linked Wikimedia SUL account (the login button would only be available on mediawiki.org but accounts are still central). In the case of high-risk accounts this can be prevented with two-factor authentication (something we plan to require anyway, see T150898: Force OATHAuth (2FA) for certain user groups in Wikimedia production); for normal accounts ("normal" here including privileged but not particularly sensitive accounts like admins) it is not much of a concern, since GitHub's account security is in all likelihood on par with or better than ours (they have three times the funding and staffing, with five different security teams), and GitHub the company itself has no motivation to abuse the access, and a lot to lose by it. (Are there social engineering based methods to take over a GitHub account? Their account management policies like renaming, recreation and usurpation make GitHub user names unsafe to rely on; can we work around that by trusting IDs instead? Also is account recovery something to be concerned about?)
On the positive side, having a linked GitHub account would provide another recovery method in the case of lost password / email or hacked accounts. Also we could skip various anti-abuse checks we do for normal registrations (like captchas or throttling, which sometimes causes problems at IRL outreach events); we'd basically outsource these to GitHub (where the user already passed them when registering there). This assumes their anti-abuse features are sane (e.g. they verify email addresses and prevent mass bot registrations) and/or we verify the GitHub accounts ourselves (e.g. only skip the standard checks for accounts that are not new and have some activity).
During login/registration, mediawiki.org learns the user's GitHub ID and email address. GitHub only learns the user's GitHub ID and IP (not their MediaWiki username), but it's relatively easy for them (or someone who can obtain their logs) to connect that to the wiki user via timestamps in the account creation log. Similarly, someone who can monitor the user's traffic can detect the mediawiki.org -> github.com -> mediawiki.org traffic pattern and connect their IP to their wiki identity through the account creation log. This is not a big change as unfortunately similar attacks are already fairly easy. See T216344: Hide account creation/autocreation times for possible mitigation.
IP addresses are personally identifiable information and as such should not be shared with third parties without user consent. Technically there would not be any "sharing" here, not any more then when including an external link in an article, but we might nevertheless want to make sure users understand what's happening, by providing some kind of information link.
Given all the fear, uncertainty and doubt around external login, I'd like to get consensus on the basic idea before trying to get commitment from budget owners or committing volunteer time. I think the AuthManager side is a few weeks work at most for someone familiar with the codebase, writing the extension is a similar amount of time, plus a long tail of potential usability improvements for which there is no time pressure.
- Needs legal and privacy review.
- Since we have a unified login system and other wikis cannot be fully isolated from mediawiki.org login features, they should be consulted.
- Maybe a security review / threat analysis?
- Try to gather more data on user demand (maybe check overlap in gerrit and github accounts? survey new mediawiki.org subscribers?)