00:01:23 <TimStarling> #startmeeting RFC meeting
00:01:23 <wm-labs-meetbot> Meeting started Thu Oct  1 00:01:23 2015 UTC and is due to finish in 60 minutes.  The chair is TimStarling. Information about MeetBot at http://wiki.debian.org/MeetBot.
00:01:23 <wm-labs-meetbot> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
00:01:23 <wm-labs-meetbot> The meeting name has been set to 'rfc_meeting'
00:01:48 <robla> #link https://phabricator.wikimedia.org/E74
00:02:15 <TimStarling> #topic Allow user login with email address in addition to username | Please note: Channel is logged and publicly posted (DO NOT REMOVE THIS NOTE) | Logs: http://bots.wmflabs.org/~wm-bot/logs/%23wikimedia-office/
00:03:11 <robla> #link https://phabricator.wikimedia.org/T30085
00:03:52 <TimStarling> devunt: how is the implementation going?
00:04:31 <TimStarling> I see there's been no new patchsets in the last week
00:04:53 <devunt> I was busy last week
00:04:58 <devunt> codes in the gerrit is the latest version
00:05:18 <TimStarling> fair enough
00:05:21 * robla waves at devunt
00:05:23 <TimStarling> csteipp: are you here?
00:05:45 <csteipp> Yeah
00:05:55 <Katie> TimStarling: Given that "@" is generally disallowed in usernames on MediaWiki wikis, what other issues are there from using a single field for both e-mail address or username?
00:06:33 * robla didn't think to put it on csteipp's calendar
00:06:36 <TimStarling> well, csteipp's concern with the original patchset was that attackers can use brute force to find valid email addresses
00:06:52 <TimStarling> if we give feedback about the email address being associated with a user
00:07:11 <TimStarling> that has been addressed, but it means losing some user feedback and thus usability
00:07:37 <TimStarling> if the user types their email address into the box, and their password was wrong, we can't tell them that, we can only say "the email or password was wrong"
00:08:15 <TimStarling> there is still the possibility of a timing attack, but maybe enough has been done for csteipp to remove his -2?
00:08:54 <csteipp> Yeah, I need to review the current patchset (sorry, haven't gotten to it)
00:09:19 <TimStarling> I've suggested some changes on gerrit and some more changes via private message
00:09:29 <csteipp> We're only checking accounts where there's exactly 1 account with that email, and the email is confirmed, right?
00:09:30 <bd808> not disclosing if the account exists or not is pretty standard on failed web logins I think. It shouldn't be a huge problem
00:09:31 <TimStarling> and my changes will probably make the timing attack more severe
00:09:32 <devunt> The current implementation does not have the possibility of a timing attack
00:09:37 <devunt> unless I did something wrong
00:09:46 <Katie> TimStarling: We're still gaining a lot of usability by offering login via e-mail address functionality.
00:10:04 <Katie> Like I see what you're saying, but I don't think the vague error messages matter very much.
00:10:16 <TimStarling> the question is: how do you log in with CentralAuth or some other auth plugin?
00:10:56 <TimStarling> csteipp should be thinking "holy crap" right now if he remembers what the CA code looks like ;)
00:11:01 <Katie> Are we keeping CentralAuth around?
00:11:17 <Katie> Does devunt care about supporting it?
00:11:19 <csteipp> Timing attack is someone can guess email address and (known wrong) password, and the check is slower if the email exists, right?
00:11:29 <devunt> I'm going to make some edits on my code to make it works with CentralAuth
00:11:34 <TimStarling> yes, we're keeping CA for now
00:13:05 <devunt> It checks the existence of an email address and if it doesn't exists, just hash a dummy string with the same process as saving a new password
00:13:23 <bd808> well....
00:13:23 <bd808> you wait for AuthManager :)
00:13:23 <bd808> oh yes
00:13:23 <bd808> CA is not going anywhere
00:13:24 <bd808> Even in the post-SUL world CA does a lot for us
00:13:26 <devunt> it should costs a same amount of time
00:13:41 <csteipp> devunt: cool!
00:13:50 * bd808__ switches to an account that isn't so horribly lagged
00:14:11 <TimStarling> so the design options for CA are either to extend the idea of login via email address into $wgAuth and the hook interface
00:14:31 <TimStarling> or to map the email address to a username early, and maintain the same login by username interface in the backend
00:14:48 <TimStarling> and I am leaning towards the latter since it is less code to touch
00:15:22 <bd808__> Anomie, tgr and I really want to get AuthManager done this quarter and it will open up more possibilities for this kind of alternate auth scenario
00:15:24 <TimStarling> but then you don't have a backend login attempt if there is no username
00:16:07 <Katie> bd808__: I don't think we should block devunt based on wants, even really wants.
00:16:36 <Katie> TimStarling: Mapping early seems fine.
00:17:12 <Katie> The idea in the initial implementation is to cover the most basic case, in my understanding. A unique e-mail address with a valid password.
00:17:29 <bd808__> it could be wedged into CA in the same places I wedged in the ability to login with a pre-SUL rename user name I think
00:17:55 <Katie> A unique e-mail address with a valid password using MediaWiki core.
00:17:56 <bd808__> it's gross but there is code there that does it
00:18:12 <csteipp> So yeah, figuring out how this goes into CentralAuth is going to be a challenge-- legoktm and I should probably figure that out. So this wont work on WMF sites until we do that.
00:18:38 <TimStarling> and devunt is doing this for WMF, so it is a blocker
00:18:59 <Katie> Oh, is he? I thought this was a volunteer project.
00:19:00 <devunt> mainly for WMF
00:19:25 <TimStarling> it is a volunteer project but volunteers always have specific motivations
00:20:21 <devunt> this is (obviously) based on the task T30085,
00:20:36 <tgr> AuthManager intends to fully replace wgAuth so it might be worth checking it out to avoid working twice
00:20:38 <devunt> and the author of that task said "especially on a big project like Wikimedia"
00:20:53 <csteipp> Good to know. So yeah, we'll have to figure how to make CA do similar-- is there a task for that part specifically?
00:21:04 <tgr> although I don't expect the actual authentication code that handles the email address would have to change much
00:21:18 <Katie> What needs to change in CentralAuth?
00:21:24 <Katie> If you map early, why does CentralAuth care?
00:21:42 <TimStarling> CA manages email addresses, the user_email field is not used
00:21:51 <csteipp> ^ that
00:22:05 <TimStarling> so if we map early, CA needs to hook into the mapping function
00:22:07 <Katie> Oh, really? Ugh.
00:22:09 <csteipp> Have to get CA's version of the email searchable
00:22:23 <devunt> So I'm going to mapping function hookable
00:22:25 <csteipp> Because on most wikis, global accounts *shoudn't* have an email in the local wiki DB
00:22:32 <devunt> for CentralAuth or any other plugins
00:22:33 <csteipp> (there's a bug for that)
00:22:42 <TimStarling> if you want a good reason for mapping early, maybe look at autocreation
00:22:44 <Katie> Not to get off-topic too much, but why is that a good thing?
00:23:12 <Katie> Like why does CentralAuth, in a post-SUL world, need to do its own special thing here and store e-mail addresses separately?
00:23:16 <csteipp> Limit the number of places we store private data
00:23:33 <TimStarling> autocreation by email has to work, if you allow login by email, that is critical for usability
00:23:40 <tgr> in general mapping early seems like the wrong way to this
00:23:46 * csteipp has to leave.. I think the general direction of the patch now looks good, so -2 removed.
00:24:02 <tgr> how do you avoid timing attacks if you can't even map non-existent email addresses to users?
00:24:03 <csteipp> ^ for core. We need to figure out CA.
00:24:12 <devunt> csteipp, thanks!
00:24:21 <devunt> And about user experiences,
00:24:54 <TimStarling> I don't think you can completely avoid timing attacks
00:25:08 <devunt> I initially designed the messages to work as same as github does.
00:25:17 <TimStarling> but CA could provide a dummy-login function which is called when there is no username for an email address
00:26:21 <tgr> anyway AuthManager when used properly will separate most concerns (such as autocreation) so you would only have to write the code to detect when the content of the username field is an email address and fetch the (central)user object based on that
00:26:53 <tgr> that would also be reasonably safe timing-wise, assuming that searching the DB by username and by email take a similar amount of time
00:27:38 <tgr> on second thought you don't really need to assume that
00:28:24 <bd808__> as long as the timing for the email path is always the same you are ok for timing attacks I think
00:28:25 <TimStarling> to be safe from timing attacks, the search function needs to be equally fast when there are zero and one matches
00:28:43 <TimStarling> how do you take out the mysql row construction overhead?
00:28:54 <Katie> Can we buffer the time?
00:28:56 <devunt> I thought we can ignore db querying time, can't we?
00:29:06 <TimStarling> I think it's really not critical, security-wise
00:29:14 <tgr> create a dummy row, make it 1 or 2 results instead?
00:29:45 <bd808__> User::newFromEmail() as written now will leak how many accounts are attached I think
00:29:46 <TimStarling> like I say, I don't think you can eliminate timing altogether
00:29:52 <TimStarling> I think you can try to limit it to say 1ms
00:30:00 <tgr> or use a LIMIT to make it always return one result which is sometimes a dummy with a password that never matches
00:30:20 <TimStarling> then you can start to talk about realistic architectures instead of accounting for every clock cycle
00:31:06 <TimStarling> what we want is for the time difference to be less than the typical timing noise
00:31:25 <devunt> I think variations of server response time is much larger than variations of sql querying time, so we can ignore them.
00:31:40 <TimStarling> I mean, we are only talking about leaking email addresses, which MW does routinely via Special:EmailUser
00:32:22 <TimStarling> we also publish people's email addresses in various other places, like mailing lists and gerrit
00:32:47 <tgr> devunt: as long as that variation has nice properties (e.g. close to normal distribution) an attacker can filter it out
00:33:12 <tgr> I'm not claiming that is an issue significant enough to block the change, just saying that it exists
00:33:15 <TimStarling> the question is whether a spammer would bother to run a thousand queries against a given email address, to average out the times and try to derive a measure of probability
00:33:25 <TimStarling> or whether they would find easier targets
00:34:10 <tgr> TimStarling: I think the more problematic attack model is trying to deanonimize users when you have a suspicion who they might be and what email address they might be using
00:34:11 <TimStarling> oh, and if an attacker can run thousands of login attempts, would they not try brute-forcing the password?
00:34:21 <TimStarling> logins are already rate-limited by IP
00:34:57 <TimStarling> well, if you suspect what their email address is, you could just send mail to it
00:35:16 <TimStarling> if you have 100 ideas about what their email address might be, you could send mail to all of them
00:36:06 <TimStarling> anyway, that is my position, be practical, benchmark it, don't try to count every cycle
00:36:31 <TimStarling> if the benchmarks show a problem then we can find a solution
00:36:47 <devunt> I agree with TimStarling that there is no way to mitigate timing attack completely
00:38:58 <devunt> and there is more than a hundreds of thousands ways to find a someone's email address to make a list of email addresses for spamming
00:39:11 <TimStarling> #info <csteipp> So yeah, figuring out how this goes into CentralAuth is going to be a challenge-- legoktm and I should probably figure that out.
00:39:36 <robla> devunt: I think there are privacy laws that don't acknowledge your point
00:39:53 <Katie> And?
00:40:06 <robla> Katie: laws we need to comply with
00:40:15 <TimStarling> there are privacy laws in the US?
00:40:25 <Katie> If the legal team has a problem, I have faith in them to say something.
00:41:10 <robla> TimStarling: IANAL....I do know the state of California has many
00:41:14 <tgr> again, you might want to make this a part of the AuthManager rewrite instead of doing it twice
00:41:28 <TimStarling> well, if we are claiming that the email address is private data, we should stop doing that
00:41:29 <Katie> What is AuthManager's status?
00:41:46 <TimStarling> since like I say, we give it out routinely via Special:EmailUser
00:41:47 <tgr> https://phabricator.wikimedia.org/T110283 is the rewrite task for CentralAuth although it's not very informative currently
00:42:15 <tgr> it's a Q2 goal I believe - bd808 / bd808__ are those final now?
00:42:24 * robla doesn't know exactly what our privacy policy says about email addresses as PII
00:42:24 <Katie> Q2 means quarter two of what?
00:42:50 <bd808__> by the end of the calendar year (Q2 of WMF fiscal)
00:42:52 <tgr> sorry, that's Oct-Dec this year
00:43:26 <Katie> MediaWiki won't be the, uhh, first software package to have login via e-mail address functionality. The laws of California really aren't relevant here unless the Wikimedia Foundation legal team comes forward and says so.
00:44:26 <devunt> do we have to rewrite this login-via-email codes after AuthManager was implemented in core?
00:44:36 <TimStarling> oh, I see we've lost csteipp
00:44:37 <tgr> again, I wouldn't worry about spammers, I would worry about, say, some Chinese dissident being identified based on email address
00:44:50 <tgr> but TimStarling is right that there are much easier attacks to make
00:44:55 <Katie> MediaWiki's login usability is horrible. We need login via e-mail address and hopefully case-insensitive login one day. I think timing attacks and legal arguments are mostly silly distractions.
00:45:03 <bd808__> I helped give voice to them, but I think the timing attack problem is mostly FUD
00:45:27 <Katie> We can reasonably mitigate, but yeah.
00:45:37 <TimStarling> so csteipp thinks he and legoktm should have input into the CA interface
00:45:38 <bd808__> In practice account disclosure on a public wiki is not a huge problem
00:45:55 <tgr> devunt: all authentication code has to be rewritten to adopt AuthManager
00:46:04 <tgr> well, not so much rewritten as reorganized
00:46:20 <devunt> and AuthManager is going to be implemented in this winter?
00:46:42 <tgr> that's the plan I believe
00:46:55 <robla> bd808__: good point about timing attack stuff.  I just wanted to make sure we didn't glibly dismiss emails as public info, which it sounds like we aren't
00:47:00 <TimStarling> so tgr, you are telling devunt to wait, not to implement and migrate?
00:47:09 <Katie> I really, really don't want to see work here stall as the result of potential future goals.
00:47:31 <TimStarling> because I think it is a pretty simple feature and can be done long before December
00:47:50 <bd808__> AuthManager has been in progress for the last 9 months at various rates of speed. Anomie, Tgr and I will be focusing on it as our major body of work for the next 3 months. That should be enough to get it into production
00:48:50 <bd808__> TimStarling: I'd say go ahead with it. we can catch authmanager up later
00:49:08 <devunt> T30085 can be done in 2 weeks or before the ends of this month, I think.
00:49:13 <tgr> TimStarling: not necessarily, but he should look at the AM project and check it's not done in a way that's going to be hard to port
00:49:16 <bd808__> but in theory things like this will be easier to accomplish in the world where authmanager exists
00:49:51 <Katie> In the future, everything will be simpler and better, yes. Today, it'd be really nice to have working login via e-mail address. :-)
00:49:58 <TimStarling> well, it sounds like login by email should be included in the AM design as soon as possible
00:50:32 <tgr> the design is very generic and definitely allows for it
00:51:00 <tgr> it basically makes it the responsibility of the auth plugin what kind of fields they would like to manage
00:51:14 <TimStarling> that is an action item for someone
00:51:23 <bd808__> in the AM model it would be up to each primary provider (eg CA) to handle the input. It breaks things more cleanly than the core code does today
00:51:53 <TimStarling> devunt's patch will have frontend stuff, like changing form labels and error messages, which you will need anyway
00:52:27 <devunt> And I can re-use the codes when migrating this feature into AM after AM is implmented
00:52:54 <TimStarling> the part which needs to be rewritten should be <100 lines
00:53:19 <devunt> thus I think writting codes now is not a waste of time, though
00:53:38 <devunt> Am I wrong about this?
00:53:55 <tgr> #action reading-infrastructure make sure email login is not a problem for T110278 and T110283
00:54:12 <bd808__> devunt: no. I think you should cary on while you have the time and inclination to work on the feature
00:54:53 <TimStarling> ok, anything else in the remaining 5 minutes?
00:55:17 <TimStarling> I think we are pretty much all done
00:55:26 <Katie> devunt: I'm really glad you're working on this. I think it will be a major improvement in MediaWiki's login usability.
00:55:47 <robla> agree with Katie, this will be great!
00:55:49 <devunt> we still have one major problem
00:56:07 <devunt> mediawiki allows duplication of email address
00:56:10 <devunt> how can I handle it?
00:56:32 <TimStarling> I have an idea for this
00:56:44 <legoktm> sorry, I'm late. /me reads up
00:56:54 <tgr> rejecting multi-user emails is good for the first version IMO
00:56:56 <TimStarling> presumably when the same email address is used, it is usually the same person
00:57:02 <Katie> I agree with tgr.
00:57:16 <TimStarling> and thus, in most cases, the same password
00:57:32 <bd808__> legoktm: tl;dr csteipp volunteered you to help figure out CA intergration :)
00:57:42 <TimStarling> so if there is a duplicate email, we can just pick an account randomly and validate the supplied password against it
00:58:15 <TimStarling> if it matches, then we can report a user-friendly error message
00:58:15 <devunt> the current implementation will only process if email address doesn't connected with multiple accounts
00:58:51 <TimStarling> like "this email address is used by more than one account, please log in with your username"
00:59:26 <legoktm> a good chunk of CA is just core copied into a hook that uses a different database
00:59:31 <TimStarling> if the password doesn't match, then we need to give the generic WRONG_INPUT message which doesn't disclose the presence of any accounts
00:59:31 <devunt> I think it is bad pattern in some ways
00:59:36 <legoktm> I don't imagine it will be difficult, just the timing stuff
00:59:58 <Katie> TimStarling: People may still complain about that type of error message violating user privacy.
01:00:03 <devunt> choosing randomly is not a good idea to me
01:00:17 <devunt> and I think it is not a good design pattern
01:00:23 <legoktm> +1 on skipping the multiple users having the same email thing for now
01:00:40 <Katie> We can just reject for now and figure out the harder cases later, right?
01:00:45 <TimStarling> well, if two people share the same email address, they can reset each others' passwords if they like
01:01:04 <TimStarling> they can't really keep anything private from each other
01:01:23 <Katie> Hmmm, right. You said only if the password works. That seems fine.
01:01:53 <Katie> We need a good test suite here. There are a lot of cases.
01:01:53 <tgr> the only drawback is that if the password is not the same everywhere you get somewhat random behavior
01:01:57 <TimStarling> yeah, if the password doesn't match, you have to assume it is an attacker scanning for valid email addresses
01:02:37 <devunt> yes I was worried about what tgr mentioned
01:02:45 <tgr> which I don't think is unusual e.g. people use the same email for their bot but set a different password
01:02:52 <tgr> probably not a big deal though
01:03:06 <legoktm> or people who use the same email and password for multiple bots >.>
01:03:37 <devunt> I think we have a basicially 3 choices here
01:04:03 <devunt> 1. Only accepts if there is only one email address in the database
01:04:26 <devunt> 2. Checking the password of randomly selected account (as TimStarling mentioned)
01:04:47 <devunt> 3. Checking passwords of all accounts that have the same email address
01:04:51 <tgr> TimStarling's suggestion is still 1 just with an improved error message
01:05:04 <TimStarling> we can continue this discussion on phabricator
01:05:24 <TimStarling> I am fine with devunt's implementation (1) for now
01:05:29 <devunt> tgr, oh, yes, it is.
01:05:38 * robla ducks out
01:05:54 <TimStarling> #endmeeting