RFC: Allow user login with email address in addition to username
Open, LowPublic

Description

Author: brassratgirl

Description:
Apologies if this is a duplicate.
It was suggested to me (and so I am passing it on) that we allow login to the projects with an email address as well as a username. So the field would read Username (or email address):

The idea behind this is that for casual editors, remembering one's username is a pain, especially on a big project like Wikimedia where many common usernames may be already taken. The step of trying to recover a lost username and password takes of valuable time and is frustrating and may be a slight barrier to entry for casual editors.

RfC: https://www.mediawiki.org/wiki/Requests_for_comment/Login_via_e-mail_address
Mailing list thread: http://thread.gmane.org/gmane.science.linguistics.wikipedia.technical/81760

Details

Reference
bz28085

Related Objects

There are a very large number of changes, so older changes are hidden. Show Older Changes
GWicke added a subscriber: GWicke.EditedSep 16 2015, 8:38 PM

One concern I have is about the handling of duplicate email addresses:

  1. How many duplicate e-mail addresses exist in our database? How popular are the most popular e-mail addresses?
  2. The current patch seems to reject login attempts when duplicate addresses exist. How bad would the user experience be in this case? How common would this case be?
daniel moved this task from Inbox to Under discussion on the TechCom-RFC board.Sep 16 2015, 8:43 PM
Qgil added a comment.Sep 16 2015, 9:07 PM

Assuming that email addresses are personal by default and assuming that this functionality would be only available to email addresses that have been confirmed, the use cases of someone running into a duplicate would be

  1. a person having multiple accounts and remembering them -- these would have the means to solve the problem if it bothers them, either changing email addresses or login with username only.
  2. a person having multiple accounts but not remembering all of them -- happened to me (I don't recall the first username I used to edit Wikipedia and I never invested enough time to remember it, if I used an email address that I'm still using today in my current account, then I will have a duplicate).
  3. different persons using some alias email address -- they would be aware of the fact that the email address is not unique and could solve the problem adding another address for each account.

The second case would be the one letting a user stuck, not being able to use or recover their email address. Still, it would be kind of their fault and I wonder how big the impact of this would be.

I still think that the feature is worth these little problems that might cause to some. New accounts could have an email address check (if none exists today), so we would not have this problem with new users, which are probably the ones expecting more this feature.

Account selector can be one solution for email address duplication, but it still requires us to deal with time attacks and may result in longer login processing time for all users. I believe there is no way to prevent time attacks completely. We have to choose between security and convenience.

@devunt, would you like to schedule this for an RFC meeting in the usual weekly time slot, maybe September 30?

I have heard that mobile wants this feature. So there is the question of whether it should be exposed via the desktop UI at all, or if that should be configurable. There is a usability disadvantage to enabling this, if we can't tell the user whether or not their password was wrong.

Also, I would like to know about the UI proposed for mobile -- is the idea to autofill the email address in an app? Or would there be a single box for username or email address, like in the proposed patch for the desktop UI? Should API clients opt in to this, and should they be able to indicate that the name they give is definitely an email address, or should the lgname parameter just be reinterpreted as in the proposed patch?

If mobile does want this feature, they should contribute development time.

@tstarling I'm afraid I'm not able at every Wednesday 21 o'clock UTC. On Wednesday, I'm able to meet at after midnight or before 14 o'clock in UTC.

@devunt: sorry for the late notice, but would you be able to meet at 2015-09-31 0:00 UTC to discuss this?

@RobLa-WMF yes. It would be okay to me.

@devunt: sorry for the late notice, but would you be able to meet at 2015-09-31 0:00 UTC to discuss this?

What? September 31 doesn't exist.

And please try to use "00:01" or "23:59" instead of "0:00" for clarity's sake.

What? September 31 doesn't exist.

Ugh, sorry, thinko. I'm keeping the timing as it is on E74 (on the hour) since I'm going to trust others to do the conversion better than I did in adding a day to September 30. :-)

yes. It would be okay to me.

Thanks @devunt! Talk to you tomorrow!

Tgr added a subscriber: Tgr.Oct 1 2015, 1:05 AM
Tgr added a comment.Oct 1 2015, 1:14 AM

(Subjective) summary of the IRC discussion: everyone is super happy this is happening, @csteipp needs to do another review pass on the patch, another patch for CentralAuth is needed for this to be useful on WMF servers, needs some coordination with AuthManager (T89459) but probably will just work, lots of discussion on timing attacks but everyone seemed to agree that in the end resisting them is not terribly important, emails used by multiple accounts should be refused but it's not entirely clear what's the best way to do that in a user-friendly manner.

Tgr added a comment.Oct 1 2015, 1:17 AM
< TimStarling> so if there is a duplicate email, we can just pick an account randomly and validate the supplied password against it
< TimStarling> if it matches, then we can report a user-friendly error message
< TimStarling> like "this email address is used by more than one account, please log in with your username"
< devunt> I think we have a basicially 3 choices here
< devunt> 1. Only accepts if there is only one email address in the database
< devunt> 2. Checking the password of randomly selected account (as TimStarling mentioned)
< devunt> 3. Checking passwords of all accounts that have the same email address

where 1 has poor usability (why does my password not work?), 2 can be random and confusing if multiple accounts with different passwords use the same email, 3 is timing attack land again.

Qgil added a comment.Oct 1 2015, 9:04 AM
In T30085#1691937, @Tgr wrote:

where 1 has poor usability (why does my password not work?)

"We cannot log you in because this email address is being claimed by more than one Wikimedia account. You can log in using your username, and you can change your email address in your Preferences."

When you do a password reset against an email address, don't we already provide a list of all the matching accounts?

devunt added a comment.Oct 1 2015, 9:26 AM

"We cannot log you in because this email address is being claimed by more than one Wikimedia account. You can log in using your username, and you can change your email address in your Preferences."

This message seems too long to put all together with other messages on the small login screen. The current error message is "Incorrect username, email address or password. Or maybe the email address is connected to multiple accounts. Usernames are case sensitive, while email addresses are case insensitive." which I think already long enough.

Niharika removed a subscriber: Niharika.Oct 1 2015, 9:35 AM
Qgil added a comment.Oct 1 2015, 9:45 AM

I wasn't trying to provide the final copy, just to prove the point that a sensible error message is possible. Still:

This email address is connected to multiple accounts. Please log in with your desired username.
Mdann52 added a subscriber: Mdann52.Oct 1 2015, 1:05 PM

When you do a password reset against an email address, don't we already provide a list of all the matching accounts?

Just tested this, and it appears so.

In T30085#1691937, @Tgr wrote:

where 1 has poor usability (why does my password not work?)

"We cannot log you in because this email address is being claimed by more than one Wikimedia account. You can log in using your username, and you can change your email address in your Preferences."

Isn't the timing attack we're trying to avoid is one that exposes whether an email address is used on a wiki, which would be confirmed if timing was equivalent to the time it takes to check 2 passwords indicating 2 users use the same email.

In that case, wouldn't this message go right past the timing attack and directly expose the information we're trying to avoid exposing?

mxn added a subscriber: mxn.Oct 2 2015, 3:41 AM

I'd like to get comments about normalising email addresses. A current implementation will turn every email addresses in the user table into equivalent lowercase characters, and will also save new email addresses in lowercase.

I think MediaWiki should treat every email addresses in lowercase, but I'm in a little hesitation to normalise every single email addresses in the database because it will take a long time to normalise them, thus it might be overkill.

I'd like to get comments about normalising email addresses. A current implementation will turn every email addresses in the user table into equivalent lowercase characters, and will also save new email addresses in lowercase.

I think MediaWiki should treat every email addresses in lowercase, but I'm in a little hesitation to normalise every single email addresses in the database because it will take a long time to normalise them, thus it might be overkill.

https://lists.wikimedia.org/pipermail/wikitech-l/2015-February/080981.html

Email addresses are practically case-insensitive, aren't they? It is very hard to find websites that consider email addresses as case-sensitive.

Email addresses are practically case-insensitive, aren't they? It is very hard to find websites that consider email addresses as case-sensitive.

According to section 2.3.11 of RFC 5321,

the local-part MUST be interpreted and assigned semantics only by the host specified in the domain part of the address.

Email addresses are practically case-insensitive, aren't they? It is very hard to find websites that consider email addresses as case-sensitive.

According to section 2.3.11 of RFC 5321,

the local-part MUST be interpreted and assigned semantics only by the host specified in the domain part of the address.

Right, but devunt's point is that in practical terms, almost nobody has an e-mail server in which Ricordisamoa@domain.com goes somewhere different than ricordisamoa@domain.com, despite the fact that the RFC technically allows such a distinction.

Case-sensitivity for e-mail addresses came up previously during a discussion about whether a user re-entering his or her e-mail address at Special:Preferences with only a different case should trigger MediaWiki to re-send a verification e-mail. In that previous discussion, it was decided to follow the RFC and consider User@example.com different from user@example.com, but I continue to think that was the wrong decision for MediaWiki.

Tgr added a comment.Oct 5 2015, 12:31 AM

The RFC discourages case-sensitive email addresses (StackExchange has a good discussion on this). They are not unheard of, but the chance of two users having the exact same email address apart from case is basically zero, and even if that happens treating them as same wouldn't be problematic (those two users couldn't log in via email, which would be more than offset by all the users who could log in despite not remembering their casing correctly). So yes, case-insensitive match for login should be fine (if there are no problems with DB performance).

Changing stored email addresses is another matter. That would completely break email functionality for the odd user with significant uppercase in their address while it wouldn't offer much benefit.

devunt added a comment.Oct 5 2015, 4:38 AM

@Tgr Then keeping old email addresses as same as now and only saving new email addresses as lowercase would be ideal?

@Tgr Then keeping old email addresses as same as now and only saving new email addresses as lowercase would be ideal?

Saving even new addresses as lowercase will still break email functionality for the odd user with significant uppercase in their email address. It just means that it'll only break new emails.

Saving even new addresses as lowercase will still break email functionality for the odd user with significant uppercase in their email address. It just means that it'll only break new emails.

Have we found one of these odd users yet? Seriously, is there a concrete example in the wild of case mattering in an e-mail address?

The user experience benefit to "User@example.com" being equivalent to "user@example.com" seems really clear to me. The potential detriments to case-insensitivity seem to be limited to the theoretical world. This seems like a clear case where MediaWiki should follow the principle of "Do What I Mean" and treat "User@example.com" the same as "user@example.com". (I'm personally also strongly of the belief that MediaWiki usernames should behave similarly; "jimbo wales" being different from "Jimbo Wales" is kind of horrible.)

Spage added a subscriber: Spage.EditedOct 8 2015, 7:57 PM
In T30085#1691932, @Tgr wrote:

(Subjective) summary of the IRC discussion: ...

Thanks! Everyone should read that.

This RFC was discussed in E74, a special IRC office hour 2015-10-01 0:00 UTC (America: late Wednesday). See meetbot log. @tstarling of the Architecture Committee commented

... I think it is a pretty simple feature and can be done long before December
we can continue this discussion on phabricator

One action item:

  • reading-infrastructure make sure email login is not a problem for T110278 and T110283 (@Tgr)

Assuming that email addresses are personal by default and assuming that this functionality would be only available to email addresses that have been confirmed, the use cases of someone running into a duplicate would be: (…)

  1. Evil vandal creating an account with the email of a legitimate user, which gets confused and unable to log in.

I wasn't trying to provide the final copy, just to prove the point that a sensible error message is possible. Still:

This email address is connected to multiple accounts. Please log in with your desired username.

Doesn't this also disclose that such email address exists? (only if the the user has several accounts, but still…)

Regarding case sensitivity, I don't think we should normalise the emails unless it's a provider we know to be case-insensitive. And even if the email arrives to the same inbox, changing the case may bypass filters previously set. Not a big deal for each user, but there's a 44 million users for each “little annoyance”.

Can someone get numbers of (authenticated) emails:

  • LOWER CASE
  • UPPER CASE
  • First uppercase
  • MiXed Case

Assuming that email addresses are personal by default and assuming that this functionality would be only available to email addresses that have been confirmed, the use cases of someone running into a duplicate would be: (…)

  1. Evil vandal creating an account with the email of a legitimate user, which gets confused and unable to log in.

I wasn't trying to provide the final copy, just to prove the point that a sensible error message is possible. Still:

This email address is connected to multiple accounts. Please log in with your desired username.

Doesn't this also disclose that such email address exists? (only if the the user has several accounts, but still…)

For both of these, I think not-- Since we only use authenticated emails, the victim would have to authenticate the account of the attacker before these work, right? But if I'm missing a way that could happen, that would be bad.

I wasn't trying to provide the final copy, just to prove the point that a sensible error message is possible. Still:

This email address is connected to multiple accounts. Please log in with your desired username.

Doesn't this also disclose that such email address exists? (only if the the user has several accounts, but still…)

For both of these, I think not-- Since we only use authenticated emails, the victim would have to authenticate the account of the attacker before these work, right? But if I'm missing a way that could happen, that would be bad.

Someone legitimately using a couple of accounts would have the email verified. That's not a reason to reveal that such email is used by someone in that wiki [family].

Tgr added a comment.Oct 8 2015, 8:42 PM

Since we only use authenticated emails, the victim would have to authenticate the account of the attacker before these work, right? But if I'm missing a way that could happen, that would be bad.

We store authenticated and pending addresses in the same DB field so the implementation would have to be careful, but a DoS of the of the email authentication is certainly avoidable. Not though that email verification is a configuration flag and theoretically there can be wikis not using it (in which case email login should probably just be disallowed altogether).

There is still an information disclosure concern though when an email address is legitimately used by multiple accounts (which is what I think Platonides was referring to). This is not uncommon, people can have test accounts, sockpuppets, bot accounts... We agreed, I think, that disclosing whether an address is in use via hard-to-do-and-probably-impractical timing attacks is not a big deal, but the login interface still shouldn't tell it outright.

We store authenticated and pending addresses in the same DB field so the implementation would have to be careful, but a DoS of the of the email authentication is certainly avoidable. Not though that email verification is a configuration flag and theoretically there can be wikis not using it (in which case email login should probably just be disallowed altogether).

That's a good point. I agree.

There is still an information disclosure concern though when an email address is legitimately used by multiple accounts (which is what I think Platonides was referring to). This is not uncommon, people can have test accounts, sockpuppets, bot accounts... We agreed, I think, that disclosing whether an address is in use via hard-to-do-and-probably-impractical timing attacks is not a big deal, but the login interface still shouldn't tell it outright.

Yes. Sorry, I missed the context of that original comment.

Assuming that email addresses are personal by default and assuming that this functionality would be only available to email addresses that have been confirmed, the use cases of someone running into a duplicate would be: (…)

  1. Evil vandal creating an account with the email of a legitimate user, which gets confused and unable to log in.

If a user has control of the e-mail address/account, he or she can reset the password of the associated user account(s). This fact was mentioned in the RFC meeting, but it's worth repeating.

It's also worth reiterating that anyone can test whether an e-mail address works (is valid) simply by attempting to send an e-mail to it. This isn't exactly the same as revealing that a specific e-mail address is in use on a specific site, but it does eat away at some of the user privacy expectations arguments.

(In that same vein, we must recognize that user privacy expectations are often formed by how other sites behave and many of them are very carefree with revealing that an e-mail address is in use.)

Regarding case sensitivity, I don't think we should normalise the emails unless it's a provider we know to be case-insensitive. And even if the email arrives to the same inbox, changing the case may bypass filters previously set. Not a big deal for each user, but there's a 44 million users for each “little annoyance”.

I don't really care one way or another about case preservation. I want "Joe@example.com" to be equivalent to "joe@example.com" throughout MediaWiki, when logging in or when updating/changing a user e-mail address.

Again, I'll ask for a specific example of a provider where case matters.

Can someone get numbers of (authenticated) emails:

  • LOWER CASE
  • UPPER CASE
  • First uppercase
  • MiXed Case

I imagine you meant "lower case" here. :-)

Tgr added a comment.Oct 9 2015, 6:43 AM

Again, I'll ask for a specific example of a provider where case matters.

You can find anecdotes with a little googling.

As long as we preserve the original case and always use that to send the mail, I don't think a case-insensitive match for logins is problematic in theory, but I don't know how easy that is in MySQL. You could do something like SELECT * FROM user WHERE user_email COLLATE utf8_unicode_ci = '<email>'; but I imagine it would impact performance...

devunt added a comment.Oct 9 2015, 8:20 AM
In T30085#1713635, @Tgr wrote:

We store authenticated and pending addresses in the same DB field so the implementation would have to be careful, but a DoS of the of the email authentication is certainly avoidable. Not though that email verification is a configuration flag and theoretically there can be wikis not using it (in which case email login should probably just be disallowed altogether).

Yes. I already have implemented that. At line 744 of SpecialUserlogin.php:

$useEmailLogin = $wgEmailAuthentication && $wgEnableEmailLogin && Sanitizer::validateEmail( $this->mUsername );
In T30085#1714553, @Tgr wrote:

As long as we preserve the original case and always use that to send the mail, I don't think a case-insensitive match for logins is problematic in theory, but I don't know how easy that is in MySQL. You could do something like SELECT * FROM user WHERE user_email COLLATE utf8_unicode_ci = '<email>'; but I imagine it would impact performance...

At line 530-532 of User.php:

'lower(user_email)' => $email,
Tgr added a comment.Oct 9 2015, 9:47 PM

That will just turn the query into a full table scan.

EXPLAIN SELECT 1 FROM user WHERE user_email = 'gtisza@wikimedia.org';
+------+-------------+-------+------+---------------+------------+---------+-------+------+-------------+
| id   | select_type | table | type | possible_keys | key        | key_len | ref   | rows | Extra       |
+------+-------------+-------+------+---------------+------------+---------+-------+------+-------------+
|    1 | SIMPLE      | user  | ref  | user_email    | user_email | 53      | const |    1 | Using where |
+------+-------------+-------+------+---------------+------------+---------+-------+------+-------------+

EXPLAIN SELECT 1 FROM user WHERE lower(user_email) = 'gtisza@wikimedia.org';
+------+-------------+-------+------+---------------+------+---------+------+--------+-------------+
| id   | select_type | table | type | possible_keys | key  | key_len | ref  | rows   | Extra       |
+------+-------------+-------+------+---------------+------+---------+------+--------+-------------+
|    1 | SIMPLE      | user  | ALL  | NULL          | NULL | NULL    | NULL | 294989 | Using where |
+------+-------------+-------+------+---------------+------+---------+------+--------+-------------+
Tgr added a subscriber: jcrespo.Oct 9 2015, 9:51 PM

At least with Wikimedia's configuration, email addresses are stored as binary strings. So you either need to change that or add a new index for lower(user_email). @jcrespo any thoughts?

@Tgr I'm not much familiar with SQL. Is SELECT * FROM user WHERE user_email COLLATE utf8_unicode_ci = '<email>'; work without problem? Adding new index that based on function such as CREATE INDEX user_email_lowercase on user(lower(user_email)); is not supported in MySQL. Adding a new field that stores lowercased email addresses could be one way to solve it but I think it might be too overkill.

Tgr added a comment.Oct 9 2015, 10:47 PM

Ugh, I always forget MySQL does not support function-based indexes. user_email is binary at least on Wikimedia and on installations which don't change $wgDBTableOptions so collations cannot be used there. I guess the options are to add new column or to not make login case-insensitive, then.

Tgr added a comment.Oct 9 2015, 10:54 PM

Or to change this column to use some case-insensitive collation, in which case SELECT ... WHERE user_email = '<email>' would just work. Don't know how difficult it is to change collation for a large table.

Tgr added a comment.Oct 9 2015, 11:08 PM

(FWIW MySQL 5.7.8 does support indexes on non-stored virtual columns, which is pretty much the same as a function-based index. But we support MySQL back to 5.0.2 so that doesn't make a difference.)

Regarding case sensitivity, I don't think we should normalise the emails unless it's a provider we know to be case-insensitive.

It's easier: do not hard-code provider-specific logic into MediaWiki.

the local-part MUST be interpreted and assigned semantics only by the host specified in the domain part of the address.

Someone could even rely on RFC 5321-compliant behavior to have the 'same' address interpreted as 'different' ones. And rightfully so.

As much as I would love to convert our tables to utf8mb4, that is not going to happen any time soon, at least not without a coordinated effort.

Please have into account that email adresses (the non-domain part) are case sensitive:
The local-part of a mailbox MUST BE treated as case sensitive.
A discussion that I think already took place when applying SUL.

If and only If the standard wanted to be violated, the right way to implement this would be to change all email address to lowercase in the first place.

Tgr added a comment.Oct 10 2015, 7:03 AM

You can only violate the SMTP standard by doing SMTP. If we converted the addresses to lowercase, we would be doing exactly that - whenever we send email via SMTP, we would use different case than we were provided with. Converting the table to case insensitive would mean that we still generate SMTP messages with the right case, and how the email addresses are used internally for login has nothing to do with RFC 5321.

Anyway, looks like case-insensitive login is not viable at this time.

Krinkle renamed this task from Allow user login with email address in addition to username to RFC: Allow user login with email address in addition to username.Feb 3 2016, 9:29 PM
Krinkle removed devunt as the assignee of this task.
Krinkle removed a project: Patch-For-Review.
RobLa-WMF mentioned this in Unknown Object (Event).Apr 13 2016, 6:54 PM
Tgr added a comment.Apr 13 2016, 9:19 PM

Needs a completely different implementation once AuthManager is enabled. Probably in AbstractPasswordPrimaryAuthenticationProvider.

Qgil removed a subscriber: Qgil.Apr 14 2016, 4:51 AM

My understanding of the status of this issue: the comment E74#1751 has the logs for the last IRC meeting we had. A long time ago, @Jaredzimmerman-WMF suggested this should get some love (see T30085#338321), but I'm not aware of any product plans around this these days.

RobLa-WMF mentioned this in Unknown Object (Event).Apr 20 2016, 6:43 AM
Elitre added a subscriber: Elitre.May 10 2017, 3:26 PM
Krinkle moved this task from (unused) to Under discussion on the TechCom-RFC board.
kchapman added a subscriber: kchapman.

Moving to TechCom-RFC backlog because this has unresolved issues due to the 1-to-many relation between addresses and usernames we have, which it seems we want to keep. So without a solution to that problem and without a product owner, we can’t move forward here.