Page MenuHomePhabricator

RfC: Use Github login for mediawiki.org
Closed, DeclinedPublic

Assigned To
None
Authored By
Tgr
Feb 1 2019, 12:20 AM
Referenced Files
F28217752: Screen Shot 2019-02-14 at 19.11.34.png
Feb 14 2019, 7:15 PM
F28217756: Screen Shot 2019-02-14 at 19.01.55.png
Feb 14 2019, 7:15 PM
F28217572: export.png
Feb 14 2019, 7:15 PM
Tokens
"Love" token, awarded by Nux."Love" token, awarded by Dbrant."Like" token, awarded by Jdforrester-WMF."Love" token, awarded by Jdlrobson.

Description

TL;DR; Use a "Log in with GitHub" button on mediawiki.org (but not the other wikis), to grow the developer community faster, and also to serve as a testbed of external login support in MediaWiki.

Use cases

Primary: improve recruitment of new developers. You need a wiki account to use Phabricator and report bugs, the support desk is hard to use without a wiki account, you need a wiki account to propose OAuth apps, etc. (And in the future SUL login will hopefully replace even more of our less-user-friendly logins, see T189531: All Wikimedia developer services should use single sign-on.) Most developers who work a little but not much with MediaWiki (and that's a lot more people than who work a lot with MediaWiki) do not have a Wikimedia SUL account; practically all developers today have a Github account. It's not hard to register in MediaWiki, but even a little friction is friction, and it means yet another password to manage, which many users dislike.

Letting you use your existing Github account which you probably use for everything else development-related means less distraction (we can skip the password and captcha, email confirmation, and probably can immediately make the account autoconfirmed if the GitHub user looks established). It also means we can connect Github accounts to wiki accounts, which can be used for a number of interesting things, especially once the linking of Wikimedia LDAP and SUL accounts happens (e.g. pulling Github pull requests into Gerrit while preserving authorship info).

Secondary: have an active but not super high-profile (and consequently not high-risk) use case where we can evaluate external logins. External logins with some major identity provider such as Google are interesting for some of the other Wikimedia wikis for a number of reasons, and also a key feature for third-party MediaWiki installations in a corporate environment; mediawiki.org seems like the ideal place to polish and evaluate MediaWiki's external login provider support as its userbase is good at identifying and reporting workflow or usability imperfections.

Measure of success

Every extension, escpecially an authentication extension, comes with nonzero maintenance cost, security surface and added UX complexity. So we only want to use an additional login method if it actually improves user retention or recruitment.

The easy thing to measure is what fraction of registrations use GitHub, so we should probably set some threshold above which we consider it worthwhile to have. Ideally we'd also measure increase in total number of registrations, but due to seasonal variations and registration numbers fluctuating a lot even under normal conditions, that might not be practical.

Technical details

Since the AuthManager rewrite, MediaWiki has first-class support for external identity providers. They display a button on the login form, on clicking it you get redirected to the external site, where you authorize sharing your account details with MediaWiki (on GitHub and most modern sites this takes the form of an OAuth handshake), on return you get logged into your linked wiki account; if you already have a wiki account but it's not linked, you can do a normal password login and link it; if you don't have a wiki account at all, you can choose a username and a wiki account linked to the external account is created for you. (There are some UI and workflow bits related to this in AuthManager which are unfinished, but what needs to be done is well understood). Multiple external accounts can be linked to the same wiki account. There is a special page for inspecting and removing linked accounts. If you have used external login to create your account, you can set up a password via the normal password change process.

External login should be thought of as an alternative of the password field; other login/registration checks (throttling, 2FA, blocks / blacklists / edit filters, etc) are applied as they would to a normal login/registration, unless intentionally prevented.

Existing MediaWiki extensions providing Github authentication include OAuth2Github and OAuth2 Client, although both seem dated. Probably the easiest way is to clone GoogleLogin, which is well-maintained and fully utilizes AuthManager, and only differs from Github auth in some implementation details of the OAuth process that both sides use for authentication. Preferably, split GoogleLogin into a generic (or generic OAuth-based) external login support layer and a provider-specific layer, to make the task of future extension developers easier.

Comparison of GitHub with other alternatives

GitHub is by far the largest code forge, with 31M developers (8M of whom joined last year) according to its annual report. It is closed source but mainly focusing on supporting opensource projects. The next largest is BitBucket (6M users, part of the Atlassian suite, not specifically opensource-focused). The largest code forge running on opensource software is GitLab (they don't give a user count but their organization count is 5% of GitHub's so at a wild guess about 1-2M users?). To avoid the NASCAR problem we want to minimize the number of external login providers we use, so the more interesting question would be, what is the extent of overlap between these user bases? (No point in adding GitLab if almost every GitLab user also has a GitHub account.) We don't have a good way of guesstimating that, though, short of maybe enabling providers one by one and seeing the additional impact on registrations.

Avoiding lock-in

We might find providing GitHub login unfeasible or not worthwile; GitHub might go out of business or change their practices in ways which make them an unappealing option for us. Thus, we must make sure to avoid lock-in: our users should not depend on GitHub for logging in. That means they either need to provide a password or a valid email address.

The simplest way to solve this is to get their email address from GitHub. That makes registration slightly more complicated (GitHub makes the user click through one permission screen when we just ask for the user ID, and two if we ask extra permissions, even if that's just seeing the email address) but that's still a decent tradeoff. We'd probably have to periodically re-query the email address so we can update it if the user changes it on GitHub.

Alternatively (or additionally), we can ask users to provide a password after a certain amount of activity (similarly to T58028: Show Echo web notification (asking users to consider providing an email) to users who don't have an e-mail address associated with their account).

Security details

The security impact is that anyone who controls the GitHub account (be that the legitimate user, someone who stole the account, or GitHub itself) can log into the linked Wikimedia SUL account (the login button would only be available on mediawiki.org but accounts are still central). In the case of high-risk accounts this can be prevented with two-factor authentication (something we plan to require anyway, see T150898: Force OATHAuth (2FA) for certain user groups in Wikimedia production); for normal accounts ("normal" here including privileged but not particularly sensitive accounts like admins) it is not much of a concern, since GitHub's account security is in all likelihood on par with or better than ours (they have three times the funding and staffing, with five different security teams), and GitHub the company itself has no motivation to abuse the access, and a lot to lose by it. (Are there social engineering based methods to take over a GitHub account? Their account management policies like renaming, recreation and usurpation make GitHub user names unsafe to rely on; can we work around that by trusting IDs instead? Also is account recovery something to be concerned about?)

On the positive side, having a linked GitHub account would provide another recovery method in the case of lost password / email or hacked accounts. Also we could skip various anti-abuse checks we do for normal registrations (like captchas or throttling, which sometimes causes problems at IRL outreach events); we'd basically outsource these to GitHub (where the user already passed them when registering there). This assumes their anti-abuse features are sane (e.g. they verify email addresses and prevent mass bot registrations) and/or we verify the GitHub accounts ourselves (e.g. only skip the standard checks for accounts that are not new and have some activity).

Privacy details

During login/registration, mediawiki.org learns the user's GitHub ID and email address. GitHub only learns the user's GitHub ID and IP (not their MediaWiki username), but it's relatively easy for them (or someone who can obtain their logs) to connect that to the wiki user via timestamps in the account creation log. Similarly, someone who can monitor the user's traffic can detect the mediawiki.org -> github.com -> mediawiki.org traffic pattern and connect their IP to their wiki identity through the account creation log. This is not a big change as unfortunately similar attacks are already fairly easy. See T216344: Hide account creation/autocreation times for possible mitigation.

IP addresses are personally identifiable information and as such should not be shared with third parties without user consent. Technically there would not be any "sharing" here, not any more then when including an external link in an article, but we might nevertheless want to make sure users understand what's happening, by providing some kind of information link.

Resourcing

Given all the fear, uncertainty and doubt around external login, I'd like to get consensus on the basic idea before trying to get commitment from budget owners or committing volunteer time. I think the AuthManager side is a few weeks work at most for someone familiar with the codebase, writing the extension is a similar amount of time, plus a long tail of potential usability improvements for which there is no time pressure.

More details

Action items

  • Needs legal and privacy review.
  • Since we have a unified login system and other wikis cannot be fully isolated from mediawiki.org login features, they should be consulted.
  • Maybe a security review / threat analysis?
  • Try to gather more data on user demand (maybe check overlap in gerrit and github accounts? survey new mediawiki.org subscribers?)

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

If we're talking about adding an external auth provider, what would the UX be when, after someone logs in via GitHub and gets used to that, we decide to remove external authentication? It sounds to me like this could be hard to undo.

There are two options to avoid locking out users:

  • Rely on the normal password reset procedure; for that everyone who uses Github login and does not have a password needs to have an email address. We could ask Github for the user's email address when they initially link the account (the benefit of this is that they do not need to bother with email verification in MediaWiki; the disadvantage is that AIUI it makes the login flow one step longer as it includes a permission request page that's not shown by default) or use some other method (e.g. T58028) to push people to set an email address. (Not having an email address is problematic with normal login as well - people forget their password and get locked out, or their account gets compromised and there is no way to contact them.)
  • Use the Github OAuth process for something akin to password reset (basically retain the Github login option but move it to the password reset form or a similar one, require users to set a password when they use it, and then do not allow using it for that account anymore). This would of course mean that the login option can be removed but the extension itself cannot.

I'm also not convinced at all that difficulty in recruiting volunteer developers is down to our relatively obscure (among the wider software development world) authentication system.

There certainly are bigger problems (lack of documentation, lack of code review, lack of an easy to use and discover support channel would be my guess at the top 3) and there are attempts to address those. No reason no to try fixing more than one issue at a time, though, and this would be a tangible improvement (I am sure the number of users who try to report a bug but then decide not to bother with captchas are nontrivial - I have given up on bug reports for non-Wikimedia products myself when the reporting process was too annoying), not a huge amount of work, and all that work would go into improving authentication in MediaWiki core and an extension with wide third-party utility.

This isn't available to logged out IP users?

It is but the user experience is poor (no notifications, captcha if you add links, your comments might be attributed to different IPs making conversation confusing).

If there are no technical blockers for this, I say go for it, provided we setup an experiment for measuring the impact of this change over a set period of time.

I'm also not convinced at all that difficulty in recruiting volunteer developers is down to our relatively obscure (among the wider software development world) authentication system.

I think it's great to question whether this will have an impact but think we should be curious here and try it out. Measure and conclude, rather than jumping to conclusions. Data is the only way to answer this once and for all.

During "google code in" I know of three developers who contributed to internet archive rather than wikimedia primarily due to github. I've had emails from 3rd parties wanting to send pull requests and pointing them to gerrit workflows and never heard back from them. I don't think an experiment in this arena would be harmful.. particular if we measured signs up via github and patches to gerrit.

If we were to do this, I'd worry that it would seem like we were promoting Github over other developer identity services.

I'd worry too, but I think we can mediate this by making clear this is an experiment based on the fact that this would be a large audience. We're an open source project so if it proves successful I'm sure we'd happily accept patches for adding other services (and if that happens it's a good project to have). If it doesn't prove successful, we remove it.

The easy thing to measure is what fraction of logins use it. We could set some limit (say 10%?) as a success measure. That could also be used as a basis to decide whether other opensource forges should be used for login integration (I think just adding everything that exists, whether there is a real demand or not, would bloat the user interface).

Whether it actually increases the number of signups that happen on mediawiki.org is harder to decide, due to seasonal variations, potential noise (spambot registrations etc), and the registration numbers fluctuating a lot even under normal conditions.

Github login in itself won't do much about adoption of Gerrit (we could add an OAuth permission for exposing the Github identity of the user to external tools like the patch uploader; but then they could also just use Github OAuth directly), but they could make it easier to ask for support questions.

Have we audited Github's login code to ensure it's as secure as our own username+password system?

mobrovac subscribed.

The TechCom will be holding an IRC meeting about this RfC next Thursday, February 14 at 07:00 UTC (08:00 CET, Feb 13 23:00 PST).

Instead of approving github as a one-off, I'd rather we create a set of criteria that we use to evaluate 3rd party login providers, and then evaluate github (and others that will inevitably be raised) against that. We're already effectively going to be creating said criteria in the questions that we ask during the meeting, so we might as well formalize it so we have an consistent answer to "but why not X?"

Here's a quick stab:

  1. There is a MediaWiki extension that provides the ability to use the login provider as a remote identity provider
  2. XX% of active MediaWiki developers have accounts on the login provider ("Practically all developers today have a Github account") or maybe XX% of developers we are trying to reach out to have accounts.
  3. The external login provider offers non-weak 2FA options (e.g. not just SMS)
  4. The external login provider has an active security team and practices responsible disclosure of security & data breaches
  5. The external login provider is ideologically aligned with Wikimedia and MediaWiki values and principles.
  6. There is some plan for if the external login provider is no longer available.

#1 is hopefully obvious because it's a technical requirement. #2 is so that whatever costs (development + maintenance time) go into this are likely to see benefits. #3 and #4 are security related, and #5 is because our movement is strongly based in our values, and we should be supporting our allies. There's also precedent in #5 from T61631. I don't know what #6 would look like.

@Krenair's question of auditing is also a good criterion, though I'm not sure whether we should be taking the time to deep audits of other sites.

I think github clearly meets 1-4, #5 is still up for debate. They're ideologically aligned in supporting FOSS, but they themselves aren't, hypocritically.

As an aside, It would be interesting to see a list of external login providers for which #2 are true.

While I don't have anything this RFC in principle, I think that it's aiming at the wrong target as the main difficulty in starting MW development lies in Gerrit. Importing a key into it, getting git-review to work and familiarizing oneself with workflow basics is orders of magnitude harder, and this proposal doesn't address this. If we were serious about attracting more developers, we will receive much better returns from investigating how we can use GitHub.

Thanks for putting this list together, Legoktm. I think starting with the provider that will have the highest impact (that we have Github mirrors for our repos but not any other mirrors is in itself implicit acknowledgement of the fact that Github is in an entirely different class than its competitors, IMO) and seeing how that goes is more practical than evaluating all of them, but what we'd expect from such a provider in general is definitely a good discussion to have.

  1. XX% of active MediaWiki developers have accounts on the login provider ("Practically all developers today have a Github account") or maybe XX% of developers we are trying to reach out to have accounts.

What is easy to measure and is a good metric of impact is the number of logins or signups using the provider (compared to the total number of logins/signups, for example). Of course to measure that we'd have to deploy the extension first (and potentially someone has to put in the effort of writing it), which is not ideal. But checking how many people in our current userbase have accounts is not practical because it is not representative (we want this for new developers, the existing ones already have Wikimedia accounts; those might be entirely different demographics) and probably not easy to measure. In theory all repo providers tie user identities to publicly available commit author email addresses, so we could do something with that, but seems more work than worth it.

Here are some numbers though that were easy to find: Github claims to have 30 million users and 2 million organizations, the BitBucket homepage says they have 6 million developers and 1 million teams, GitLab's says 100 thousand organizations (I wasn't able to find a user count). For a sense of scale, Wikimedia has about 1700 Gerrit users and all Gerrit repos together include a little less than ten thousand git commit authors (with some amount of duplication, probably).

  1. The external login provider offers non-weak 2FA options (e.g. not just SMS)

Why? We use them as a replacement for the password step of the login, we have no reason to care whether they provide a second factor at all. (Anyway there is a big difference between "offers" and "people actually use it".) If we want account security, we can do 2FA on our end.
Or do you mean it as a heuristic for security competence in general? We need providers to be competent enough to prevent mass bot account registration and similar abuse (since on our end we'd probably want to relax the usual new account checks) and also to not lose passwords easily, so that has some value.

  1. The external login provider has an active security team and practices responsible disclosure of security & data breaches

Yeah (although obviously the latter is hard to judge).

  1. The external login provider is ideologically aligned with Wikimedia and MediaWiki values and principles.

Again, why? If using external service X furthers the aims of the Wikimedia movement (or the MediaWiki community, if we are talking about mediawiki.org), then using it seems ideologically aligned by definition. There are a bunch of reasons to use opensource software (which in the end all boil down into expecting that it will be more effective or less risky at furthering our aims in the long term), but I don't think any of them apply to external login providers. What we really using there is their user database, not their software - and that's always proprietary. If there were a non-proprietary source of user identities with significant user base, that would be a good value-based candidate. Sadly, Persona did not work out (see, that was an occasion where we could have meaningfully supported an ally, but we are still poor at identifying strategic opportunities like that), and WebAuthn is too young at this point.

  1. There is some plan for if the external login provider is no longer available.

Asking for email addresses seems like the simplest approach to that. In the case of Github that means the user has to click through two screens (an authorization screen for identifying to mw.org and a grant authorization screen) instead of one, but we can live with that.
Alternatively, there is the less robust method of poking users to add an email address after they have been around for a while. We might not care much about users with 0 edits locking themselves out.

we should be supporting our allies

People have a habit of conflating "we should be supporting our allies" and "we should be using software written by our allies" even though most of the time these are not related at all. We are not supporting a login provider (ally or not) by using them; if they are significantly smaller than us they might benefit from the attention directed at them, but we have somewhere between 1-2000 active mw.org users and even GitLab likely has hundreds of thousands of active users so that's really not the case here.

There are obvious ways of supporting our allies (giving them money, spending resources on improving their projects, using our voice to promote them), which mostly don't happen, in no small part because "values" gets thrown around instead of actual arguments, and that tends to not impress anyone who has a budget to manage. Values are not religious dogma that we learn watching others and then follow without thinking, values are rational expectations about what works well in the long term, just very heavily compressed. You need to be able to uncompress them in the case of disagreements.

So I would rather think about how to actually support our allies, and I don't think designating every FLOSS project in existence as an ally is useful for doing that - we need to identify projects we actually rely on (or would rely on if they were more mature), or which have a meaningful chance of furthering our goals. I don't think opensource login providers are one of those. And for code forge purposes we are already using an opensource project (not actually supporting it, mind you - whether we can do that would be a good discussion to have).

There's also precedent in #5 from T61631.

The only thing that task is precedent for is how to have terrible discussions. There are plenty of sensible reasons for or against using a specific login provider, but it seems like almost all were carefully avoided there.

@Krenair's question of auditing is also a good criterion, though I'm not sure whether we should be taking the time to deep audits of other sites.

I took that one as good-natured trolling :) We don't audit any complex third-party software we rely on, even where we would have easy access to the source code (Phabricator, Gerrit, MariaDB, Freenode server software, Linux components etc.), not to mention closed-source software we rely on in far more crucial ways (Google's authentication system for example). It would be a fairly ridiculous expectation with the resources the WMF has. If you are Google or the NSA, maybe you can afford to do that kind of thing.

They're ideologically aligned in supporting FOSS, but they themselves aren't, hypocritically.

I'd argue a service for supporting opensource development that people actually use is a better ally of the FLOSS movement than one that is very philosophically aligned but not actually used by people, but as I said above, I think that's kind of tangential.

While I don't have anything this RFC in principle, I think that it's aiming at the wrong target as the main difficulty in starting MW development lies in Gerrit.

I don't see how that makes it the wrong target. The main difficulty in starting Wikipedia editing is writing your first article or nontrivial article expansion (and then dealing with the influx of policy references); that doesn't make anonymous editing and easy signup bad features for attracting new editors. Similarly, mediawiki.org registration is definitely not the main roadblock for becoming a developer, but by making smoother you increase the number of people who even try to use Gerrit.

(Also, the target audience is wider - a Wikimedia account is needed for filing bugs to Phabricator, for registering OAuth apps, and helpful for asking questions on the wiki or fixing documentation. All those apply to a wider groups of users than using Gerrit, and to an earlier stage in the user -> developer evolution. If the early steps are easy, by the time the user gets to the point of wanting to commit code to a Gerrit repo, hopefully they are invested enough in the community to survive the experience.)

If we were serious about attracting more developers, we will receive much better returns from investigating how we can use GitHub.

Which this proposal in no way prevents you from doing :) It might even help in some small ways if we can identify users' Github accounts.

(If somebody is interested, T136863: Should Wikimedia have standard policies for managing github mirror repos? is the low-hanging fruit IMO. T37497: Implement a way to bring GitHub pull requests into gerrit is probably not at all low-hanging but might be the most impactful one.)

One way to read this RFC's core proposal is "provide a way for users to authenticate to Wikimedia wikis without having to manage a new password." There is currently nothing proposed here that would change anything about an established SUL account other than this. I really think the only thing that we should be concerned about after approval is the "NASCAR problem" of how to manage the login page support for multiple authentication integrations.

I think any user logging in through such a external provider should (unless they also pass some extra check done on our side like 2FA) be at least limited to the rights provided to standard logged in users, i.e. anything extra like sysop is discarded.

I think any user logging in through such a external provider should (unless they also pass some extra check done on our side like 2FA) be at least limited to the rights provided to standard logged in users, i.e. anything extra like sysop is discarded.

I wouldn't expect GitHub to be any less reliable than our own system of password policies, abuse detection etc. That is not a high level of confidence, but then we don't force 2FA on sysops using password login, either. IMO:

  • We should replace our current nomenclature of privileged and unprivileged accounts with two different levels of privilege: security-sensitive accounts (can deploy Javascript, access or leak private data, or manage user permissions which allow those things; potentially a few more things which could be extremely disruptive like mass import or deploying central notices (although currently those both allow deploying Javscript anyway)) and abuse-sensitive accounts that cannot be used for security exploits in the traditional sense but can be used for highly disruptive vandalism (page deletion, interface text modifications, abuse filter changes, high-volume edits, bot flag, probably a lot of other things). Membership in the first group should require 2FA no matter the login method (or at a minimum 2FA should be required to actually use those privileges). Membership in the second group should not require 2FA, but would result in a more aggressive password policy and logging. (This is something we need to do anyway since we want to require 2FA for checkusers, bureaucrats, interface admins etc, but don't consider requiring 2FA for all admins viable.) I think sysops are not really sensitive today so I wouldn't worry about them logging in via GitHub. This task is probably orthogonal - no reason to think GitHub login is more risky than our current password-based login, so it should not be a blocker.
  • We should have audit logs for which method is used for a given login/signup (obviously). But logins live for a year and logs are only retained for 1-3 months so
  • we should have a way to track what login method a given session came from (and add that to all security audit logs, and have the ability to restrict or invalidate all such sessions). This is tricky because no session data/metadata is preserved on the server side. Some kind of cookie signing I guess? Would probably require modifying some AuthManager/SessionManager-related interfaces. Given that an attacker gaining access via GitHub and than not doing anything for months (until the audit logs expire) seems like a very remote possibility, I don't think this should be a strict blocker, either.

If we had a more specific list of concerns (what are the attack scenarios where we know Wikimedia logins are well-protected but we have doubts about GitHub logins), maybe we could reach out to GitHub and have a chat with their security team (and do the same for any other external login providers if/when we plan adding them).

I think the privacy implications are a bit more nuanced than what you've stated, and should grant a request to the legal department for a green light before we allow an external authentication provider to work on any site under our privacy policy, if we don't do this already (and AFAIK this would be a first, more or less).

@Joe definitely needs a legal review, but we need to lay out the privacy implications for that. What else do you think needs to be called out?

I want to thank @Tgr for pushing this proposal and stirring this discussion. I am learning a lot.

One way to read this RFC's core proposal is "provide a way for users to authenticate to Wikimedia wikis without having to manage a new password."

I agree, and the meeting notes seem to confirm this. It seems between hard and pointless to discuss (and implement) "Use Github login for mediawiki.org" without discussing "Use third party accounts for Wikimedia SUL". The structure of that discussion could be:

  1. Yes / No
  2. If Yes, what are the criteria to consider third parties.
  3. When the criteria are defined, which third party should we try first.

Potential (and likely) additional third parties would be discussed and agreed one by one.

There are questions about how to proceed with such community discussion in a way that is fair and productive. The answers are not straightforward, that is for sure. In my opinion...

  • The bad news is that this discussion is likely to be complex no matter what, probably requiring some kind of Foundation prioritization or mandate to discuss and agree.
  • The good news is that this topic is directly related with some of the Foundation's top priorities around new readers, growing participation, emerging communities, platform evolution...

So... even if starting by taking the bottom-up route and discuss this topic in cross-wiki spaces might feel tempting, I really think that this will lead somewhere only after getting on board and well aligned with Foundation programs like New Readers and Platform Evolution.

I think any user logging in through such a external provider should (unless they also pass some extra check done on our side like 2FA) be at least limited to the rights provided to standard logged in users, i.e. anything extra like sysop is discarded.

  • We should replace our current nomenclature of privileged and unprivileged accounts with two different levels of privilege: security-sensitive accounts (can deploy Javascript, access or leak private data, or manage user permissions which allow those things; potentially a few more things which could be extremely disruptive like mass import or deploying central notices (although currently those both allow deploying Javscript anyway)) and abuse-sensitive accounts that cannot be used for security exploits in the traditional sense but can be used for highly disruptive vandalism (page deletion, interface text modifications, abuse filter changes, high-volume edits, bot flag, probably a lot of other things). Membership in the first group should require 2FA no matter the login method (or at a minimum 2FA should be required to actually use those privileges). Membership in the second group should not require 2FA, but would result in a more aggressive password policy and logging. (This is something we need to do anyway since we want to require 2FA for checkusers, bureaucrats, interface admins etc, but don't consider requiring 2FA for all admins viable.) I think sysops are not really sensitive today so I wouldn't worry about them logging in via GitHub. This task is probably orthogonal - no reason to think GitHub login is more risky than our current password-based login, so it should not be a blocker.

What I'm saying is that while security-sensitive accounts should always require 2FA to use their extra rights, abuse-sensitive accounts should require 2FA to use their extra rights if the session was authenticated by some third party. Otherwise, we'll've just granted technical access to rights that are usually only available via RfA etc. to a third-party auth provider the moment a sysop allows external login to their account.

Otherwise, yes I think this needs a community-wide global RFC.

Random (uninvited) thoughts:

As long as any 3rd-party-login still respects the following just like the current login:

  • autoblocks
  • Title blacklist on Meta
  • Blocks placed on the IP with "no account creation" trigger ON
  • AbuseFilter rules that possibly could prevent account creation

I am fine with this idea. If anything fails with these criteria, the world will burn with sockpuppets using the new method to create hell an unlimited number of sockpuppets to burn the world.

Looking over the IRC log,

07:20:52 <tgr> yeah, most of the benefits depend on being able to trust that the external provider does a decent job vetting accounts (email validation, but also rate limiting etc)
07:21:22 <tgr> if we can assume that, which for github seems reasonable, we can skip a bunch of annoying checks on our side
07:21:56 <tgr> captcha, throttling (think mass subscriptions at an IRL event), autoconfirmed flag, email confirmation...

So someone who signs up for a new account on GitHub and immediately uses it to create a Wikimedia account would get to skip a bunch of our anti-abuse measures? That sounds like a terrible idea to me.

If you gate it on some sort of measurable level of positive activity on GitHub, then maybe. Details would probably be important.

GitHub is the largest code forge, with 31M developers (8M of whom joined last year)

OTOH, I wonder how many of those 31M or 8M are people who had to create an account to file a bug against some project, but otherwise have no desire to use GitHub?

As long as any 3rd-party-login still respects the following just like the current login:

  • autoblocks
  • Title blacklist on Meta
  • Blocks placed on the IP with "no account creation" trigger ON
  • AbuseFilter rules that possibly could prevent account creation

All of those should be respected, unless someone does something stupid outside of what AuthManager allows for. Those checks shouldn't even notice whether the user used a password or github for creating the account.

One way to read this RFC's core proposal is "provide a way for users to authenticate to Wikimedia wikis without having to manage a new password."

I agree. Another way to achieve that would be to use e-mail tokens. That is, the flow would be

  1. "Log in".
  2. Enter e-mail address (can be remembered or autofilled in-browser). Leads to interstitial while you check your e-mail and click a link.
  3. Link leads to autocreate URL (applies filters etc), and if not prevented, you're now immediately logged-in!

No passwords needed :-)

This flow could be extended:

  • The first time you do this for an address, you'd see a one-click form where you confirm you really do want to create an account. This is similar to what various sites do when you authenticate via GitHub/Twitter/Facebook for the first time. Which further indicates that it is not just a temporary session that may leave behind traces in the form of an attributed edit, but that there's also an "account" of sorts, which users generally should me made aware of. I remember StackOverflow and IMDb working this way. Also, this form could have a "pick your username" field, if we decide that we don't want it to be mandatory to be identified in some generic way by your external ID.
  • If/when we allow 2FA to be used generally, and if/when we want it to be used for password-less accounts, then the destination of the link can be the 2FA screen for users who have that enabled.

Regarding e-mail confirmation, GitHub does require it during registration. Bu, they don't require confirmation before general use of the account. You can in fact have multiple addressed associated, and verification is optional there as well.

A few of us (Reedy, Btongminh, myself, and others..) actually use this to have old SVN commits associated with our GitHub profiles. (I wonder what GitHub does if multiple accounts do this, maybe oldest wins?)

Screen Shot 2019-02-14 at 19.01.55.png (316×1 px, 38 KB)
Screen Shot 2019-02-14 at 19.11.34.png (808×1 px, 161 KB)
export.png (774×934 px, 58 KB)

A few of us (Reedy, Btongminh, myself, and others..) actually use this to have old SVN commits associated with our GitHub profiles. (I wonder what GitHub does if multiple accounts do this, maybe oldest wins?)

I think they special cased @users.mediawiki.org - I had to file a support ticket to make it work with my account without a verifying users.

Summary of the IRC discussion:

  • TechCom seems OK with the proposal in a narrowly technical sense but feels the legal/social aspects of the project are outside their expertise/authority (see blockers below). They might be more comfortable approving something similar to more segregated technical spaces (like wikitech or Phabricator).
  • The external login provider knows your IP, browser fingerprint and external account, and it might be possible to connect that to the wiki username (which they don't receive; but mediawiki.org has a public registration log, and only about a hundred registrations per day, so probably not that hard to guess). Other large websites in general, and GitHub in particular, are probably less strident in handling private data than Wikimedia is, so users using external signup expose their private data more, potentially without understanding it.
  • The best way to avoid becoming dependent on the external login provider is to ask for permission to see their email address. How concerned are we about 1) the user changing their email address at the login provider, and us getting stuck with an outdated one? (In theory we could periodically re-query it, probably even without user interaction, but it's extra work.) 2) about the user getting locked out because their email address goes dead and they don't care to update it (neither on the wiki nor at the external login provider)?
  • Registration involves various anti-abuse measures (captcha and throttling to prevent mass bot registration, email address verification to prevent spam). When using an external login provider we can skip these and make signup more frictionless, but we need to be able to trust them to do equivalent checks on their side competently.
  • As long as the user understands what they are doing, this is fine privacy-wise: they consent to giving a small chunk of information to the external login provider, no one else is affected. Except relying on account vetting by the external provider allows us to skip problematic anti-abuse measures on our side. If those measures are a significant obstacle to registration (at least in some cases) - say a local tech event runs into the IP throttling limit, so after a while new users can sign up with external login but not otherwise - does that still count as voluntary consent?

Blockers / action items:

  • Needs to go through legal and privacy review.
  • Given the unified logic system, this is going to effect content wikis in a sense (since I can go and sign up on mediawiki.org and then edit on enwiki with that account) so they should probably be consulted in some way (meta RfC?).

I handwaved a bit on IRC about how I think everything should be moving to SUL (maybe not SUL in its current anyone-who-can-edit-javascript-can-steal-your-account form but some form of unified Wikimedia login). The related task is T189531: All Wikimedia developer services should use single sign-on. (@Joe if you think someone has different plans and is unaware of this task please connect us!)

  • The bad news is that this discussion is likely to be complex no matter what, probably requiring some kind of Foundation prioritization or mandate to discuss and agree.

In general I'm not a fan of making the Foundation a gatekeeper to even starting discussions. Being a staff member (and located in the SF office), I'm privileged in that I could probably convince people to hold some kind of formal WMF-managed discussion. Most of our developer community is not, and IMO we should be conscious and cautious about when we force people to rely on that privilege.

So... even if starting by taking the bottom-up route and discuss this topic in cross-wiki spaces might feel tempting, I really think that this will lead somewhere only after getting on board and well aligned with Foundation programs like New Readers and Platform Evolution.

I guess one way of looking at this is that people in those programs should at a minimum be aware (and then if they feel a bottom-up discussion is going to be a problem for them, they can object). How would that look in practice? Notify Community Relations since they are generally the ones to keep track of ongoing conversation?
(Or was that already an objection? :)

What I'm saying is that while security-sensitive accounts should always require 2FA to use their extra rights, abuse-sensitive accounts should require 2FA to use their extra rights if the session was authenticated by some third party. Otherwise, we'll've just granted technical access to rights that are usually only available via RfA etc. to a third-party auth provider the moment a sysop allows external login to their account.

Standard threat analysis involves an estimation of likelihood, value to the attacker, and potential damage. A major company abusing their identity API to gain access under a false identity to a Wikimedia account would be massively unethical, a PR catastrophe for them, and almost certainly illegal in the US under the CFAA. It would be of no value whatsoever to the external provider or for an employee there abusing their authority. The damage for us would be minimal, some annoying vandalism maybe. I continue to see this as a complete non-issue.

As long as any 3rd-party-login still respects the following just like the current login:

  • autoblocks
  • Title blacklist on Meta
  • Blocks placed on the IP with "no account creation" trigger ON
  • AbuseFilter rules that possibly could prevent account creation

By default everything works the same way for any kind of login. For AF/blacklists that is just fine. For IP blocks/autoblocks we could choose to exempt external logins if we want, on account that these cause collateral damage and an external login provider "vouching" for the user is a good way to sort out good users from the bad. If we do this, we'd probably limit it to accounts which look reputable (e.g. in the case of GitHub, they are not brand new and had a certain amount of activity).

So someone who signs up for a new account on GitHub and immediately uses it to create a Wikimedia account would get to skip a bunch of our anti-abuse measures? That sounds like a terrible idea to me.
If you gate it on some sort of measurable level of positive activity on GitHub, then maybe. Details would probably be important.

Someone who just signed up on GitHub is not a relevant use case for us (no one is going to create a GitHub account just so they can log in to MediaWiki that way; no one with good intentions anyway). The target audience is people who use GitHub enough that they are logged in there most of the time. So yeah, this would be conditional on the GitHub account being "in good standing", however we define that. Details are important but also somewhat pointless to discuss preemptively, in absence of actual experience.

OTOH, I wonder how many of those 31M or 8M are people who had to create an account to file a bug against some project, but otherwise have no desire to use GitHub?

If they only use GitHub to file issues (or just like to see their face in the corner while browsing code), that would be fine with us. The more relevant question is, how many of those people use GitHub regularly enough that logging in via it is actually a convenience to them?
Which of course we don't know. The only way to find out is to actually do it, then measure the number of signups we get that way (and back out if it turns out to be unimpressive).

Another way to achieve that would be to use e-mail tokens.

I guess, although that seems a way more distracting workflow for me (and possibly confusing since other sites don't do that).
Persona would have unified the benefits of the approaches: you log in with your email, the site asks your browser to validate it. If you have never done this before for any website in that browser, it sends you to a trusted identity verification service; that service uses an email token (or something more relevant like Google OAuth if it can guess from your email address) to verify you; after successful verification your browser stores a public/private key pair and passes the public to the initiating website one in lieu of a password, and the verifying site attests the authenticity of the email address. If you have ever done this before, your browser already has the key pair and attestation, so it's one click signup.
It was a really cool idea, shame it did not work out. Although WebAuthn preserved parts of the original concept.

  • If/when we allow 2FA to be used generally, and if/when we want it to be used for password-less accounts, then the destination of the link can be the 2FA screen for users who have that enabled.

That will certainly be something interesting to look at once WebAuthn support bceomes more widespread. Although you'd still need an email confirmation step.

Regarding e-mail confirmation, GitHub does require it during registration. Bu, they don't require confirmation before general use of the account. You can in fact have multiple addressed associated, and verification is optional there as well.

I'm fairly sure I had to verify those. In any case, those are not the ones we would receive or care about - we'd get the email GitHub uses to communicate with the user (as we should, since we also need it for communication).

There was discussion about improving privacy by limiting user creation log access; filed T216344: Hide account creation/autocreation times about that.

Standard threat analysis involves an estimation of likelihood, value to the attacker, and potential damage. A major company abusing their identity API to gain access under a false identity to a Wikimedia account would be massively unethical, a PR catastrophe for them, and almost certainly illegal in the US under the CFAA. It would be of no value whatsoever to the external provider or for an employee there abusing their authority. The damage for us would be minimal, some annoying vandalism maybe. I continue to see this as a complete non-issue.

I think the threat analysis would be a little more complicated then that. I agree github being evil is unlikely, and probably not something to worry too much about unless the gain in being evil would be significant [which may be true of certain priv'd accounts].

But one does not have to be evil to be hurtful. For example, we allow people to take over other user accounts under certain circumstances (usurpation). Does github allow that? I have no idea [I honestly haven't looked into trying to find any policies or what not]. By outsourcing account management we outsource account management policies. Maybe Github's doesn't align with what our expectations are. Has github even written down what their policies are on these types of matters?

Then there's the potential for people to maliciously abuse github's systems to attack us. AFAICT the main argument against this risk is that github is a major tech company, and probably is competent or at least as competent as we are. And fair enough, that's probably true. But it seems like reputation is a very wishy-washy criteria to make a decision about this on.

Then there is high-privileged accounts (lets say checkuser or interface-admin) [I think this was Krenair's main point]. If these accounts are attached via github, well suddenly the potential fall-out is significantly worse. This would go much beyond annoying vandalism. In particular, checkuser is privacy sensitive to a lot of our users - could github potentially be coerced (e.g. Legally but also in other ways) to hand over a checkuser prived account?

I'm not saying that any of these things are show-stopper risks - they are after all low-likelyhood and may be reasonable risks to accept. But I think there is enough here, and this is a new enough area for us, that it deserves a more full risk analysis.

For example, we allow people to take over other user accounts under certain circumstances (usurpation).

We don't really, we allow them to be renamed to free up the username. The user ID is unchanged so an intelligent OAuth downstream would notice something is wrong.
I think the GutHub situation is similar, as long as we use IDs we should be OK. (Usernames are definitely not safe. They can be changed; per their user deletion policy "The account name also becomes available to anyone else to use on a new account"; and they also have a kind of usurpation policy.)

By outsourcing account management we outsource account management policies. Maybe Github's doesn't align with what our expectations are.

In general that's a very good point, thanks. Probably something that belongs to @Legoktm's list of requirements.
The two issues I can think of are organizational accounts (we should probably ban using those, if it's even possible to use them) and account recovery. (In general there's no reason to distrust GitHub's processes, but if the GitHub account does not own anything interesting, they might be less careful about recovery. Although the processes as stated there do not seem social-engineerable.)

Then there's the potential for people to maliciously abuse github's systems to attack us. AFAICT the main argument against this risk is that github is a major tech company, and probably is competent or at least as competent as we are. And fair enough, that's probably true. But it seems like reputation is a very wishy-washy criteria to make a decision about this on.

They have three times the revenue and staffing of Wikimedia, that clearly extends to their security teams (they have five of them), and account security is business-critical to them. I don't see why that would be wishy-washy or what other criteria could we use.

Then there is high-privileged accounts (lets say checkuser or interface-admin) [I think this was Krenair's main point]. If these accounts are attached via github, well suddenly the potential fall-out is significantly worse. This would go much beyond annoying vandalism. In particular, checkuser is privacy sensitive to a lot of our users - could github potentially be coerced (e.g. Legally but also in other ways) to hand over a checkuser prived account?

That still seems unrealistic to me but I agree we should protect such accounts (which we plan to do anyway). A required 2FA step on our side seems sufficient contermeasure for that.

But I think there is enough here, and this is a new enough area for us, that it deserves a more full risk analysis.

Would you or your team be willing to perform a proper review?

Per the IRC discussion (and common sense) this requires legal review. TechCom people, can you make that happen? (Past experience says if I ask for it as a random developer, it will take approximately forever.) I tried to summarize the privacy context in the corresponding section of the task description.

I pinged the stewards, as recommended in the IRC discussion. Also left a notice at the mediawiki.org community portal (which I stupidly forgot to do before).

I pinged the stewards, as recommended in the IRC discussion. Also left a notice at the mediawiki.org community portal (which I stupidly forgot to do before).

Thank you for the ping. I have relayed your email through the mailing list (non-members posts get moderated, but the message is now approved).

I cannot comment much for now, but I note that GitHub does allow lowecase usernames, when MediaWiki does not. Would that be an issue if this proposal succeeds (I am not asking to allow all-lowercase usernames on MediaWiki now, as that'd be a source of much abuse and impersonations nowadays). Thanks.

Someone who just signed up on GitHub is not a relevant use case for us (no one is going to create a GitHub account just so they can log in to MediaWiki that way; no one with good intentions anyway). The target audience is people who use GitHub enough that they are logged in there most of the time. So yeah, this would be conditional on the GitHub account being "in good standing", however we define that. Details are important but also somewhat pointless to discuss preemptively, in absence of actual experience.

You omitted mentioning that when you originally proposed bypassing anti-abuse measures. You still seem to be handwaving it in other replies.

OTOH, I wonder how many of those 31M or 8M are people who had to create an account to file a bug against some project, but otherwise have no desire to use GitHub?

If they only use GitHub to file issues (or just like to see their face in the corner while browsing code), that would be fine with us. The more relevant question is, how many of those people use GitHub regularly enough that logging in via it is actually a convenience to them?

You're the one who's making a big deal over the numbers, and making assumptions based on them that "practically all developers today have a Github account" and would want to use them as an authentication method for our infrastructure.

Which of course we don't know. The only way to find out is to actually do it, then measure the number of signups we get that way (and back out if it turns out to be unimpressive).

You could try doing some user surveys before just jumping ahead and implementing it. For example, ask people who recently created an account on mediawiki.org and used it for Phab or Gerrit whether they have a github account and whether they'd have liked to be able to use it when creating the account on mediawiki.org. Or ask people who interact with our mirrors on github whether they'd use Phab/Gerrit if they could use their github account to authenticate on mediawiki.org.

I cannot comment much for now, but I note that GitHub does allow lowecase usernames, when MediaWiki does not. Would that be an issue if this proposal succeeds

The local username doesn't need to match the github username at all. Any local username, whether based on the github name or not, would still need to pass all the usual checks, including our validity requirements and anti-spoof.

You omitted mentioning that when you originally proposed bypassing anti-abuse measures. You still seem to be handwaving it in other replies.

I'm still handwaving it, I think it is an implementation detail that is hard to meaningfully discuss without seeing what kind of abuse, if any, happens via GitHub logins.

You could try doing some user surveys before just jumping ahead and implementing it. For example, ask people who recently created an account on mediawiki.org and used it for Phab or Gerrit whether they have a github account and whether they'd have liked to be able to use it when creating the account on mediawiki.org. Or ask people who interact with our mirrors on github whether they'd use Phab/Gerrit if they could use their github account to authenticate on mediawiki.org.

Yeah, that might be worth a try.

This is becoming a monster of a thread for context and discussion, thanks to @Tgr for continually updating the description for overview. Mostly @Bawolff mirrors my concerns but I had some thoughts that I don't see totally reflected in the discussion to this point.

From the description IIUC the main motivation here is to provide ease of entry for technical contributors by hooking into a commonly used platform for developers to avoid as many wiki specific hurdles as possible, or at least to allow folks over there to come over here without managing a new identity and password.

What's the increased activity we expect to see here?

I have not encountered anyone who has told me that our account creation process or requiring an account is preventing them from participating. I'm wondering how we know the negative impact here is real, especially with the high cost (at least I expect) of integrating and maintaining this in our ecosystem. i.e. the "problem statement" here isn't concrete for me. It could be one person a year, or one thousand but it doesn't sound like we know.

It seems probably misleading for developers coming from Github as they will be expected to create an account to interact on Gerrit anyway, and that account is not consolidated. If that's the case, have we lost any benefit?

@Legoktm for me has the best approach here. Github or ? aside, is this model viable for us?

https://phabricator.wikimedia.org/T215046#4939924

#2 there I think speaks to the lack of concrete problem statement as well.

Are there more TechComm sessions coming?

The previous one I knew about was around 1 AM my time and difficult to attend.

You omitted mentioning that when you originally proposed bypassing anti-abuse measures. You still seem to be handwaving it in other replies.

I'm still handwaving it, I think it is an implementation detail that is hard to meaningfully discuss without seeing what kind of abuse, if any, happens via GitHub logins.

I would expect all the same abuses we encounter now at the very least, and maybe some we haven't seen based on new interactions, which I think @revi has also indicated in https://phabricator.wikimedia.org/T215046#4954067

Summary of the IRC discussion:

  • TechCom seems OK with the proposal in a narrowly technical sense but feels the legal/social aspects of the project are outside their expertise/authority (see blockers below). They might be more comfortable approving something similar to more segregated technical spaces (like wikitech or Phabricator).

I was surprised we are targeting mediawiki.org as well. A consolidated authentication scheme for "Mediawiki Developers" across phab/gerrit/etc with integration for 3rd parties seems to be closer to the outcome desired. I'm not quite clear for Github OAUTH2 authentication how SUL plays into this, is it true in this proposal we want Github folks to be able to authenticate but /only/ to mediawiki.org?

That seems like it has a long list of weird outcomes with existing SUL identity handling to my naive ears (after digesting what's written on task here), and the only venue that's currently useful in to my knowledge is Phabricator? I'm probably misunderstanding there.

  • Registration involves various anti-abuse measures (captcha and throttling to prevent mass bot registration, email address verification to prevent spam). When using an external login provider we can skip these and make signup more frictionless, but we need to be able to trust them to do equivalent checks on their side competently.

These anti-abuse measures really are critical, and outsourcing them is going to come at great cost I expect.

It may be unpopular to ask this last question, but I mean it in the a truly good faith sense :) Is this being pursued in the course of work for Wikimedia Foundation priorities or is this being pursued as a volunteer maintained feature and by whom?

I have not encountered anyone who has told me that our account creation process or requiring an account is preventing them from participating.

(In FY2017-2018 @srishakatux sent out four survey to all new developers in a quarter who put a changeset into Wikimedia Gerrit. https://www.mediawiki.org/wiki/New_Developers/Quarterly#Summary_of_key_findings states that "New developers continue to struggle with the code contribution process" but I do not know if we had explicit "Why did I have to set up a separate account in times of OAuth" replies; maybe Srishti knows.)

I have not encountered anyone who has told me that our account creation process or requiring an account is preventing them from participating.

(In FY2017-2018 @srishakatux sent out four survey to all new developers in a quarter who put a changeset into Wikimedia Gerrit. https://www.mediawiki.org/wiki/New_Developers/Quarterly#Summary_of_key_findings states that "New developers continue to struggle with the code contribution process" but I do not know if we had explicit "Why did I have to set up a separate account in times of OAuth" replies; maybe Srishti knows.)

I have no data, but given what our code contribution process is, I find it hard to imagine that signing up for an account is the part of the code contribution process new contributors struggle with.

(In FY2017-2018 @srishakatux sent out four survey to all new developers in a quarter who put a changeset into Wikimedia Gerrit. https://www.mediawiki.org/wiki/New_Developers/Quarterly#Summary_of_key_findings states that "New developers continue to struggle with the code contribution process" but I do not know if we had explicit "Why did I have to set up a separate account in times of OAuth" replies; maybe Srishti knows.)

New developers reported difficulties in understanding Gerrit and coding conventions and struggled with a slow code-review process and missing documentation. I don't recall anything around account creation that came up in the survey results.

TechCom further discussed this and while we could approve a technical implementation we can't speak to the product side of things. We need a Product Owner that supports this before we further review the technical RFC.

Could we try this on a test wiki rather than deploy this on MediaWiki.org? As it stands I'm not currently comfortable in implementing such a drastic change directly.

Could we try this on a test wiki rather than deploy this on MediaWiki.org? As I stands I'm not currently comfortable in implementing such a drastic change directly.

It really does not matter which wiki in the main Wikimedia farm this is enabled on. Once it is enabled on one then it is functionally enabled on all of them in our post-SUL world. The only way to isolate testing would be in a non-SUL wiki, and we really don't have any "testing" wikis of that nature in production that I am aware of. To me this really means any experiments should be done in deployment-prep or another designated testing environment. This would be needed anyway for any testing of new extensions to enable this.

chasemp triaged this task as Medium priority.Dec 9 2019, 4:30 PM
Krinkle removed a project: Wikimedia-GitHub.

Closing old RFC that is not yet on to our 2020 process and does not appear to have an active owner. Feel free to re-open with our template or file a new one when that changes.

The outcome of the old process was that this is outside TechCom scope (it is a community or product decision they felt ill-equipped to make). So the technical discussion is blocked on getting community or product support. Reopening it as it is as an RfC probably won't do much good.

Having it it open as a (blocked) Phase 1 RFC is totally fine, no bother at all. I closed it it appeared there was noone currently intending to do the product/community work. If there is any intent even vaguely, then it's no bother at all having it on the board!