00:01:23 #startmeeting RFC meeting 00:01:23 Meeting started Thu Oct 1 00:01:23 2015 UTC and is due to finish in 60 minutes. The chair is TimStarling. Information about MeetBot at http://wiki.debian.org/MeetBot. 00:01:23 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 00:01:23 The meeting name has been set to 'rfc_meeting' 00:01:48 #link https://phabricator.wikimedia.org/E74 00:02:15 #topic Allow user login with email address in addition to username | Please note: Channel is logged and publicly posted (DO NOT REMOVE THIS NOTE) | Logs: http://bots.wmflabs.org/~wm-bot/logs/%23wikimedia-office/ 00:03:11 #link https://phabricator.wikimedia.org/T30085 00:03:52 devunt: how is the implementation going? 00:04:31 I see there's been no new patchsets in the last week 00:04:53 I was busy last week 00:04:58 codes in the gerrit is the latest version 00:05:18 fair enough 00:05:21 * robla waves at devunt 00:05:23 csteipp: are you here? 00:05:45 Yeah 00:05:55 TimStarling: Given that "@" is generally disallowed in usernames on MediaWiki wikis, what other issues are there from using a single field for both e-mail address or username? 00:06:33 * robla didn't think to put it on csteipp's calendar 00:06:36 well, csteipp's concern with the original patchset was that attackers can use brute force to find valid email addresses 00:06:52 if we give feedback about the email address being associated with a user 00:07:11 that has been addressed, but it means losing some user feedback and thus usability 00:07:37 if the user types their email address into the box, and their password was wrong, we can't tell them that, we can only say "the email or password was wrong" 00:08:15 there is still the possibility of a timing attack, but maybe enough has been done for csteipp to remove his -2? 00:08:54 Yeah, I need to review the current patchset (sorry, haven't gotten to it) 00:09:19 I've suggested some changes on gerrit and some more changes via private message 00:09:29 We're only checking accounts where there's exactly 1 account with that email, and the email is confirmed, right? 00:09:30 not disclosing if the account exists or not is pretty standard on failed web logins I think. It shouldn't be a huge problem 00:09:31 and my changes will probably make the timing attack more severe 00:09:32 The current implementation does not have the possibility of a timing attack 00:09:37 unless I did something wrong 00:09:46 TimStarling: We're still gaining a lot of usability by offering login via e-mail address functionality. 00:10:04 Like I see what you're saying, but I don't think the vague error messages matter very much. 00:10:16 the question is: how do you log in with CentralAuth or some other auth plugin? 00:10:56 csteipp should be thinking "holy crap" right now if he remembers what the CA code looks like ;) 00:11:01 Are we keeping CentralAuth around? 00:11:17 Does devunt care about supporting it? 00:11:19 Timing attack is someone can guess email address and (known wrong) password, and the check is slower if the email exists, right? 00:11:29 I'm going to make some edits on my code to make it works with CentralAuth 00:11:34 yes, we're keeping CA for now 00:13:05 It checks the existence of an email address and if it doesn't exists, just hash a dummy string with the same process as saving a new password 00:13:23 well.... 00:13:23 you wait for AuthManager :) 00:13:23 oh yes 00:13:23 CA is not going anywhere 00:13:24 Even in the post-SUL world CA does a lot for us 00:13:26 it should costs a same amount of time 00:13:41 devunt: cool! 00:13:50 * bd808__ switches to an account that isn't so horribly lagged 00:14:11 so the design options for CA are either to extend the idea of login via email address into $wgAuth and the hook interface 00:14:31 or to map the email address to a username early, and maintain the same login by username interface in the backend 00:14:48 and I am leaning towards the latter since it is less code to touch 00:15:22 Anomie, tgr and I really want to get AuthManager done this quarter and it will open up more possibilities for this kind of alternate auth scenario 00:15:24 but then you don't have a backend login attempt if there is no username 00:16:07 bd808__: I don't think we should block devunt based on wants, even really wants. 00:16:36 TimStarling: Mapping early seems fine. 00:17:12 The idea in the initial implementation is to cover the most basic case, in my understanding. A unique e-mail address with a valid password. 00:17:29 it could be wedged into CA in the same places I wedged in the ability to login with a pre-SUL rename user name I think 00:17:55 A unique e-mail address with a valid password using MediaWiki core. 00:17:56 it's gross but there is code there that does it 00:18:12 So yeah, figuring out how this goes into CentralAuth is going to be a challenge-- legoktm and I should probably figure that out. So this wont work on WMF sites until we do that. 00:18:38 and devunt is doing this for WMF, so it is a blocker 00:18:59 Oh, is he? I thought this was a volunteer project. 00:19:00 mainly for WMF 00:19:25 it is a volunteer project but volunteers always have specific motivations 00:20:21 this is (obviously) based on the task T30085, 00:20:36 AuthManager intends to fully replace wgAuth so it might be worth checking it out to avoid working twice 00:20:38 and the author of that task said "especially on a big project like Wikimedia" 00:20:53 Good to know. So yeah, we'll have to figure how to make CA do similar-- is there a task for that part specifically? 00:21:04 although I don't expect the actual authentication code that handles the email address would have to change much 00:21:18 What needs to change in CentralAuth? 00:21:24 If you map early, why does CentralAuth care? 00:21:42 CA manages email addresses, the user_email field is not used 00:21:51 ^ that 00:22:05 so if we map early, CA needs to hook into the mapping function 00:22:07 Oh, really? Ugh. 00:22:09 Have to get CA's version of the email searchable 00:22:23 So I'm going to mapping function hookable 00:22:25 Because on most wikis, global accounts *shoudn't* have an email in the local wiki DB 00:22:32 for CentralAuth or any other plugins 00:22:33 (there's a bug for that) 00:22:42 if you want a good reason for mapping early, maybe look at autocreation 00:22:44 Not to get off-topic too much, but why is that a good thing? 00:23:12 Like why does CentralAuth, in a post-SUL world, need to do its own special thing here and store e-mail addresses separately? 00:23:16 Limit the number of places we store private data 00:23:33 autocreation by email has to work, if you allow login by email, that is critical for usability 00:23:40 in general mapping early seems like the wrong way to this 00:23:46 * csteipp has to leave.. I think the general direction of the patch now looks good, so -2 removed. 00:24:02 how do you avoid timing attacks if you can't even map non-existent email addresses to users? 00:24:03 ^ for core. We need to figure out CA. 00:24:12 csteipp, thanks! 00:24:21 And about user experiences, 00:24:54 I don't think you can completely avoid timing attacks 00:25:08 I initially designed the messages to work as same as github does. 00:25:17 but CA could provide a dummy-login function which is called when there is no username for an email address 00:26:21 anyway AuthManager when used properly will separate most concerns (such as autocreation) so you would only have to write the code to detect when the content of the username field is an email address and fetch the (central)user object based on that 00:26:53 that would also be reasonably safe timing-wise, assuming that searching the DB by username and by email take a similar amount of time 00:27:38 on second thought you don't really need to assume that 00:28:24 as long as the timing for the email path is always the same you are ok for timing attacks I think 00:28:25 to be safe from timing attacks, the search function needs to be equally fast when there are zero and one matches 00:28:43 how do you take out the mysql row construction overhead? 00:28:54 Can we buffer the time? 00:28:56 I thought we can ignore db querying time, can't we? 00:29:06 I think it's really not critical, security-wise 00:29:14 create a dummy row, make it 1 or 2 results instead? 00:29:45 User::newFromEmail() as written now will leak how many accounts are attached I think 00:29:46 like I say, I don't think you can eliminate timing altogether 00:29:52 I think you can try to limit it to say 1ms 00:30:00 or use a LIMIT to make it always return one result which is sometimes a dummy with a password that never matches 00:30:20 then you can start to talk about realistic architectures instead of accounting for every clock cycle 00:31:06 what we want is for the time difference to be less than the typical timing noise 00:31:25 I think variations of server response time is much larger than variations of sql querying time, so we can ignore them. 00:31:40 I mean, we are only talking about leaking email addresses, which MW does routinely via Special:EmailUser 00:32:22 we also publish people's email addresses in various other places, like mailing lists and gerrit 00:32:47 devunt: as long as that variation has nice properties (e.g. close to normal distribution) an attacker can filter it out 00:33:12 I'm not claiming that is an issue significant enough to block the change, just saying that it exists 00:33:15 the question is whether a spammer would bother to run a thousand queries against a given email address, to average out the times and try to derive a measure of probability 00:33:25 or whether they would find easier targets 00:34:10 TimStarling: I think the more problematic attack model is trying to deanonimize users when you have a suspicion who they might be and what email address they might be using 00:34:11 oh, and if an attacker can run thousands of login attempts, would they not try brute-forcing the password? 00:34:21 logins are already rate-limited by IP 00:34:57 well, if you suspect what their email address is, you could just send mail to it 00:35:16 if you have 100 ideas about what their email address might be, you could send mail to all of them 00:36:06 anyway, that is my position, be practical, benchmark it, don't try to count every cycle 00:36:31 if the benchmarks show a problem then we can find a solution 00:36:47 I agree with TimStarling that there is no way to mitigate timing attack completely 00:38:58 and there is more than a hundreds of thousands ways to find a someone's email address to make a list of email addresses for spamming 00:39:11 #info So yeah, figuring out how this goes into CentralAuth is going to be a challenge-- legoktm and I should probably figure that out. 00:39:36 devunt: I think there are privacy laws that don't acknowledge your point 00:39:53 And? 00:40:06 Katie: laws we need to comply with 00:40:15 there are privacy laws in the US? 00:40:25 If the legal team has a problem, I have faith in them to say something. 00:41:10 TimStarling: IANAL....I do know the state of California has many 00:41:14 again, you might want to make this a part of the AuthManager rewrite instead of doing it twice 00:41:28 well, if we are claiming that the email address is private data, we should stop doing that 00:41:29 What is AuthManager's status? 00:41:46 since like I say, we give it out routinely via Special:EmailUser 00:41:47 https://phabricator.wikimedia.org/T110283 is the rewrite task for CentralAuth although it's not very informative currently 00:42:15 it's a Q2 goal I believe - bd808 / bd808__ are those final now? 00:42:24 * robla doesn't know exactly what our privacy policy says about email addresses as PII 00:42:24 Q2 means quarter two of what? 00:42:50 by the end of the calendar year (Q2 of WMF fiscal) 00:42:52 sorry, that's Oct-Dec this year 00:43:26 MediaWiki won't be the, uhh, first software package to have login via e-mail address functionality. The laws of California really aren't relevant here unless the Wikimedia Foundation legal team comes forward and says so. 00:44:26 do we have to rewrite this login-via-email codes after AuthManager was implemented in core? 00:44:36 oh, I see we've lost csteipp 00:44:37 again, I wouldn't worry about spammers, I would worry about, say, some Chinese dissident being identified based on email address 00:44:50 but TimStarling is right that there are much easier attacks to make 00:44:55 MediaWiki's login usability is horrible. We need login via e-mail address and hopefully case-insensitive login one day. I think timing attacks and legal arguments are mostly silly distractions. 00:45:03 I helped give voice to them, but I think the timing attack problem is mostly FUD 00:45:27 We can reasonably mitigate, but yeah. 00:45:37 so csteipp thinks he and legoktm should have input into the CA interface 00:45:38 In practice account disclosure on a public wiki is not a huge problem 00:45:55 devunt: all authentication code has to be rewritten to adopt AuthManager 00:46:04 well, not so much rewritten as reorganized 00:46:20 and AuthManager is going to be implemented in this winter? 00:46:42 that's the plan I believe 00:46:55 bd808__: good point about timing attack stuff. I just wanted to make sure we didn't glibly dismiss emails as public info, which it sounds like we aren't 00:47:00 so tgr, you are telling devunt to wait, not to implement and migrate? 00:47:09 I really, really don't want to see work here stall as the result of potential future goals. 00:47:31 because I think it is a pretty simple feature and can be done long before December 00:47:50 AuthManager has been in progress for the last 9 months at various rates of speed. Anomie, Tgr and I will be focusing on it as our major body of work for the next 3 months. That should be enough to get it into production 00:48:50 TimStarling: I'd say go ahead with it. we can catch authmanager up later 00:49:08 T30085 can be done in 2 weeks or before the ends of this month, I think. 00:49:13 TimStarling: not necessarily, but he should look at the AM project and check it's not done in a way that's going to be hard to port 00:49:16 but in theory things like this will be easier to accomplish in the world where authmanager exists 00:49:51 In the future, everything will be simpler and better, yes. Today, it'd be really nice to have working login via e-mail address. :-) 00:49:58 well, it sounds like login by email should be included in the AM design as soon as possible 00:50:32 the design is very generic and definitely allows for it 00:51:00 it basically makes it the responsibility of the auth plugin what kind of fields they would like to manage 00:51:14 that is an action item for someone 00:51:23 in the AM model it would be up to each primary provider (eg CA) to handle the input. It breaks things more cleanly than the core code does today 00:51:53 devunt's patch will have frontend stuff, like changing form labels and error messages, which you will need anyway 00:52:27 And I can re-use the codes when migrating this feature into AM after AM is implmented 00:52:54 the part which needs to be rewritten should be <100 lines 00:53:19 thus I think writting codes now is not a waste of time, though 00:53:38 Am I wrong about this? 00:53:55 #action reading-infrastructure make sure email login is not a problem for T110278 and T110283 00:54:12 devunt: no. I think you should cary on while you have the time and inclination to work on the feature 00:54:53 ok, anything else in the remaining 5 minutes? 00:55:17 I think we are pretty much all done 00:55:26 devunt: I'm really glad you're working on this. I think it will be a major improvement in MediaWiki's login usability. 00:55:47 agree with Katie, this will be great! 00:55:49 we still have one major problem 00:56:07 mediawiki allows duplication of email address 00:56:10 how can I handle it? 00:56:32 I have an idea for this 00:56:44 sorry, I'm late. /me reads up 00:56:54 rejecting multi-user emails is good for the first version IMO 00:56:56 presumably when the same email address is used, it is usually the same person 00:57:02 I agree with tgr. 00:57:16 and thus, in most cases, the same password 00:57:32 legoktm: tl;dr csteipp volunteered you to help figure out CA intergration :) 00:57:42 so if there is a duplicate email, we can just pick an account randomly and validate the supplied password against it 00:58:15 if it matches, then we can report a user-friendly error message 00:58:15 the current implementation will only process if email address doesn't connected with multiple accounts 00:58:51 like "this email address is used by more than one account, please log in with your username" 00:59:26 a good chunk of CA is just core copied into a hook that uses a different database 00:59:31 if the password doesn't match, then we need to give the generic WRONG_INPUT message which doesn't disclose the presence of any accounts 00:59:31 I think it is bad pattern in some ways 00:59:36 I don't imagine it will be difficult, just the timing stuff 00:59:58 TimStarling: People may still complain about that type of error message violating user privacy. 01:00:03 choosing randomly is not a good idea to me 01:00:17 and I think it is not a good design pattern 01:00:23 +1 on skipping the multiple users having the same email thing for now 01:00:40 We can just reject for now and figure out the harder cases later, right? 01:00:45 well, if two people share the same email address, they can reset each others' passwords if they like 01:01:04 they can't really keep anything private from each other 01:01:23 Hmmm, right. You said only if the password works. That seems fine. 01:01:53 We need a good test suite here. There are a lot of cases. 01:01:53 the only drawback is that if the password is not the same everywhere you get somewhat random behavior 01:01:57 yeah, if the password doesn't match, you have to assume it is an attacker scanning for valid email addresses 01:02:37 yes I was worried about what tgr mentioned 01:02:45 which I don't think is unusual e.g. people use the same email for their bot but set a different password 01:02:52 probably not a big deal though 01:03:06 or people who use the same email and password for multiple bots >.> 01:03:37 I think we have a basicially 3 choices here 01:04:03 1. Only accepts if there is only one email address in the database 01:04:26 2. Checking the password of randomly selected account (as TimStarling mentioned) 01:04:47 3. Checking passwords of all accounts that have the same email address 01:04:51 TimStarling's suggestion is still 1 just with an improved error message 01:05:04 we can continue this discussion on phabricator 01:05:24 I am fine with devunt's implementation (1) for now 01:05:29 tgr, oh, yes, it is. 01:05:38 * robla ducks out 01:05:54 #endmeeting