Extra RFC Meeting on #wikimedia-office IRC channel (2015-10-01 UTC/2015-09-30 PDT)

Hosted by RobLa-WMF on Oct 1 2015, 12:00 AM - 1:00 AM.


We haven't coordinated this time with @devunt, so this meeting may be rescheduled for them.

Event Timeline

RobLa-WMF renamed this event from to Extra RFC Meeting on #wikimedia-office IRC channel (2015-09-31 UTC/2015-09-30 PDT).Sep 30 2015, 1:33 AM
RobLa-WMF changed the start date for this event from to Oct 1 2015, 12:00 AM.
RobLa-WMF changed the end date for this event from to Oct 1 2015, 1:00 AM.
RobLa-WMF invited: ; uninvited: .
RobLa-WMF updated the event description. (Show Details)
RobLa-WMF added a project: TechCom-RFC.
RobLa-WMF updated the event description. (Show Details)

Hrm. As with T30085#1688185. Maybe "2015-09-31 0:00 UTC" is some kind of convention I'm simply unfamiliar with, but it flies directly in the face of https://en.wikipedia.org/wiki/Thirty_days_hath_September and I find it incredibly confusing.

RobLa-WMF renamed this event from Extra RFC Meeting on #wikimedia-office IRC channel (2015-09-31 UTC/2015-09-30 PDT) to Extra RFC Meeting on #wikimedia-office IRC channel (2015-10-01 UTC/2015-09-30 PDT).Sep 30 2015, 4:29 AM
RobLa-WMF updated the event description. (Show Details)
devunt invited: ; uninvited: .Sep 30 2015, 4:42 AM

Belated posting of links:
[01:05:55] <wm-labs-meetbot> Minutes: https://tools.wmflabs.org/meetbot/wikimedia-office/2015/wikimedia-office.2015-10-01-00.01.html
[01:05:55] <wm-labs-meetbot> Minutes (text): https://tools.wmflabs.org/meetbot/wikimedia-office/2015/wikimedia-office.2015-10-01-00.01.txt
[01:05:55] <wm-labs-meetbot> Minutes (wiki): https://tools.wmflabs.org/meetbot/wikimedia-office/2015/wikimedia-office.2015-10-01-00.01.wiki
[01:05:55] <wm-labs-meetbot> Log: https://tools.wmflabs.org/meetbot/wikimedia-office/2015/wikimedia-office.2015-10-01-00.01.log.html

100:01:23 <TimStarling> #startmeeting RFC meeting
200:01:23 <wm-labs-meetbot> Meeting started Thu Oct 1 00:01:23 2015 UTC and is due to finish in 60 minutes. The chair is TimStarling. Information about MeetBot at http://wiki.debian.org/MeetBot.
300:01:23 <wm-labs-meetbot> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
400:01:23 <wm-labs-meetbot> The meeting name has been set to 'rfc_meeting'
500:01:48 <robla> #link https://phabricator.wikimedia.org/E74
600:02:15 <TimStarling> #topic Allow user login with email address in addition to username | Please note: Channel is logged and publicly posted (DO NOT REMOVE THIS NOTE) | Logs: http://bots.wmflabs.org/~wm-bot/logs/%23wikimedia-office/
700:03:11 <robla> #link https://phabricator.wikimedia.org/T30085
800:03:52 <TimStarling> devunt: how is the implementation going?
900:04:31 <TimStarling> I see there's been no new patchsets in the last week
1000:04:53 <devunt> I was busy last week
1100:04:58 <devunt> codes in the gerrit is the latest version
1200:05:18 <TimStarling> fair enough
1300:05:21 * robla waves at devunt
1400:05:23 <TimStarling> csteipp: are you here?
1500:05:45 <csteipp> Yeah
1600:05:55 <Katie> TimStarling: Given that "@" is generally disallowed in usernames on MediaWiki wikis, what other issues are there from using a single field for both e-mail address or username?
1700:06:33 * robla didn't think to put it on csteipp's calendar
1800:06:36 <TimStarling> well, csteipp's concern with the original patchset was that attackers can use brute force to find valid email addresses
1900:06:52 <TimStarling> if we give feedback about the email address being associated with a user
2000:07:11 <TimStarling> that has been addressed, but it means losing some user feedback and thus usability
2100:07:37 <TimStarling> if the user types their email address into the box, and their password was wrong, we can't tell them that, we can only say "the email or password was wrong"
2200:08:15 <TimStarling> there is still the possibility of a timing attack, but maybe enough has been done for csteipp to remove his -2?
2300:08:54 <csteipp> Yeah, I need to review the current patchset (sorry, haven't gotten to it)
2400:09:19 <TimStarling> I've suggested some changes on gerrit and some more changes via private message
2500:09:29 <csteipp> We're only checking accounts where there's exactly 1 account with that email, and the email is confirmed, right?
2600:09:30 <bd808> not disclosing if the account exists or not is pretty standard on failed web logins I think. It shouldn't be a huge problem
2700:09:31 <TimStarling> and my changes will probably make the timing attack more severe
2800:09:32 <devunt> The current implementation does not have the possibility of a timing attack
2900:09:37 <devunt> unless I did something wrong
3000:09:46 <Katie> TimStarling: We're still gaining a lot of usability by offering login via e-mail address functionality.
3100:10:04 <Katie> Like I see what you're saying, but I don't think the vague error messages matter very much.
3200:10:16 <TimStarling> the question is: how do you log in with CentralAuth or some other auth plugin?
3300:10:56 <TimStarling> csteipp should be thinking "holy crap" right now if he remembers what the CA code looks like ;)
3400:11:01 <Katie> Are we keeping CentralAuth around?
3500:11:17 <Katie> Does devunt care about supporting it?
3600:11:19 <csteipp> Timing attack is someone can guess email address and (known wrong) password, and the check is slower if the email exists, right?
3700:11:29 <devunt> I'm going to make some edits on my code to make it works with CentralAuth
3800:11:34 <TimStarling> yes, we're keeping CA for now
3900:13:05 <devunt> It checks the existence of an email address and if it doesn't exists, just hash a dummy string with the same process as saving a new password
4000:13:23 <bd808> well....
4100:13:23 <bd808> you wait for AuthManager :)
4200:13:23 <bd808> oh yes
4300:13:23 <bd808> CA is not going anywhere
4400:13:24 <bd808> Even in the post-SUL world CA does a lot for us
4500:13:26 <devunt> it should costs a same amount of time
4600:13:41 <csteipp> devunt: cool!
4700:13:50 * bd808__ switches to an account that isn't so horribly lagged
4800:14:11 <TimStarling> so the design options for CA are either to extend the idea of login via email address into $wgAuth and the hook interface
4900:14:31 <TimStarling> or to map the email address to a username early, and maintain the same login by username interface in the backend
5000:14:48 <TimStarling> and I am leaning towards the latter since it is less code to touch
5100:15:22 <bd808__> Anomie, tgr and I really want to get AuthManager done this quarter and it will open up more possibilities for this kind of alternate auth scenario
5200:15:24 <TimStarling> but then you don't have a backend login attempt if there is no username
5300:16:07 <Katie> bd808__: I don't think we should block devunt based on wants, even really wants.
5400:16:36 <Katie> TimStarling: Mapping early seems fine.
5500:17:12 <Katie> The idea in the initial implementation is to cover the most basic case, in my understanding. A unique e-mail address with a valid password.
5600:17:29 <bd808__> it could be wedged into CA in the same places I wedged in the ability to login with a pre-SUL rename user name I think
5700:17:55 <Katie> A unique e-mail address with a valid password using MediaWiki core.
5800:17:56 <bd808__> it's gross but there is code there that does it
5900:18:12 <csteipp> So yeah, figuring out how this goes into CentralAuth is going to be a challenge-- legoktm and I should probably figure that out. So this wont work on WMF sites until we do that.
6000:18:38 <TimStarling> and devunt is doing this for WMF, so it is a blocker
6100:18:59 <Katie> Oh, is he? I thought this was a volunteer project.
6200:19:00 <devunt> mainly for WMF
6300:19:25 <TimStarling> it is a volunteer project but volunteers always have specific motivations
6400:20:21 <devunt> this is (obviously) based on the task T30085,
6500:20:36 <tgr> AuthManager intends to fully replace wgAuth so it might be worth checking it out to avoid working twice
6600:20:38 <devunt> and the author of that task said "especially on a big project like Wikimedia"
6700:20:53 <csteipp> Good to know. So yeah, we'll have to figure how to make CA do similar-- is there a task for that part specifically?
6800:21:04 <tgr> although I don't expect the actual authentication code that handles the email address would have to change much
6900:21:18 <Katie> What needs to change in CentralAuth?
7000:21:24 <Katie> If you map early, why does CentralAuth care?
7100:21:42 <TimStarling> CA manages email addresses, the user_email field is not used
7200:21:51 <csteipp> ^ that
7300:22:05 <TimStarling> so if we map early, CA needs to hook into the mapping function
7400:22:07 <Katie> Oh, really? Ugh.
7500:22:09 <csteipp> Have to get CA's version of the email searchable
7600:22:23 <devunt> So I'm going to mapping function hookable
7700:22:25 <csteipp> Because on most wikis, global accounts *shoudn't* have an email in the local wiki DB
7800:22:32 <devunt> for CentralAuth or any other plugins
7900:22:33 <csteipp> (there's a bug for that)
8000:22:42 <TimStarling> if you want a good reason for mapping early, maybe look at autocreation
8100:22:44 <Katie> Not to get off-topic too much, but why is that a good thing?
8200:23:12 <Katie> Like why does CentralAuth, in a post-SUL world, need to do its own special thing here and store e-mail addresses separately?
8300:23:16 <csteipp> Limit the number of places we store private data
8400:23:33 <TimStarling> autocreation by email has to work, if you allow login by email, that is critical for usability
8500:23:40 <tgr> in general mapping early seems like the wrong way to this
8600:23:46 * csteipp has to leave.. I think the general direction of the patch now looks good, so -2 removed.
8700:24:02 <tgr> how do you avoid timing attacks if you can't even map non-existent email addresses to users?
8800:24:03 <csteipp> ^ for core. We need to figure out CA.
8900:24:12 <devunt> csteipp, thanks!
9000:24:21 <devunt> And about user experiences,
9100:24:54 <TimStarling> I don't think you can completely avoid timing attacks
9200:25:08 <devunt> I initially designed the messages to work as same as github does.
9300:25:17 <TimStarling> but CA could provide a dummy-login function which is called when there is no username for an email address
9400:26:21 <tgr> anyway AuthManager when used properly will separate most concerns (such as autocreation) so you would only have to write the code to detect when the content of the username field is an email address and fetch the (central)user object based on that
9500:26:53 <tgr> that would also be reasonably safe timing-wise, assuming that searching the DB by username and by email take a similar amount of time
9600:27:38 <tgr> on second thought you don't really need to assume that
9700:28:24 <bd808__> as long as the timing for the email path is always the same you are ok for timing attacks I think
9800:28:25 <TimStarling> to be safe from timing attacks, the search function needs to be equally fast when there are zero and one matches
9900:28:43 <TimStarling> how do you take out the mysql row construction overhead?
10000:28:54 <Katie> Can we buffer the time?
10100:28:56 <devunt> I thought we can ignore db querying time, can't we?
10200:29:06 <TimStarling> I think it's really not critical, security-wise
10300:29:14 <tgr> create a dummy row, make it 1 or 2 results instead?
10400:29:45 <bd808__> User::newFromEmail() as written now will leak how many accounts are attached I think
10500:29:46 <TimStarling> like I say, I don't think you can eliminate timing altogether
10600:29:52 <TimStarling> I think you can try to limit it to say 1ms
10700:30:00 <tgr> or use a LIMIT to make it always return one result which is sometimes a dummy with a password that never matches
10800:30:20 <TimStarling> then you can start to talk about realistic architectures instead of accounting for every clock cycle
10900:31:06 <TimStarling> what we want is for the time difference to be less than the typical timing noise
11000:31:25 <devunt> I think variations of server response time is much larger than variations of sql querying time, so we can ignore them.
11100:31:40 <TimStarling> I mean, we are only talking about leaking email addresses, which MW does routinely via Special:EmailUser
11200:32:22 <TimStarling> we also publish people's email addresses in various other places, like mailing lists and gerrit
11300:32:47 <tgr> devunt: as long as that variation has nice properties (e.g. close to normal distribution) an attacker can filter it out
11400:33:12 <tgr> I'm not claiming that is an issue significant enough to block the change, just saying that it exists
11500:33:15 <TimStarling> the question is whether a spammer would bother to run a thousand queries against a given email address, to average out the times and try to derive a measure of probability
11600:33:25 <TimStarling> or whether they would find easier targets
11700:34:10 <tgr> TimStarling: I think the more problematic attack model is trying to deanonimize users when you have a suspicion who they might be and what email address they might be using
11800:34:11 <TimStarling> oh, and if an attacker can run thousands of login attempts, would they not try brute-forcing the password?
11900:34:21 <TimStarling> logins are already rate-limited by IP
12000:34:57 <TimStarling> well, if you suspect what their email address is, you could just send mail to it
12100:35:16 <TimStarling> if you have 100 ideas about what their email address might be, you could send mail to all of them
12200:36:06 <TimStarling> anyway, that is my position, be practical, benchmark it, don't try to count every cycle
12300:36:31 <TimStarling> if the benchmarks show a problem then we can find a solution
12400:36:47 <devunt> I agree with TimStarling that there is no way to mitigate timing attack completely
12500:38:58 <devunt> and there is more than a hundreds of thousands ways to find a someone's email address to make a list of email addresses for spamming
12600:39:11 <TimStarling> #info <csteipp> So yeah, figuring out how this goes into CentralAuth is going to be a challenge-- legoktm and I should probably figure that out.
12700:39:36 <robla> devunt: I think there are privacy laws that don't acknowledge your point
12800:39:53 <Katie> And?
12900:40:06 <robla> Katie: laws we need to comply with
13000:40:15 <TimStarling> there are privacy laws in the US?
13100:40:25 <Katie> If the legal team has a problem, I have faith in them to say something.
13200:41:10 <robla> TimStarling: IANAL....I do know the state of California has many
13300:41:14 <tgr> again, you might want to make this a part of the AuthManager rewrite instead of doing it twice
13400:41:28 <TimStarling> well, if we are claiming that the email address is private data, we should stop doing that
13500:41:29 <Katie> What is AuthManager's status?
13600:41:46 <TimStarling> since like I say, we give it out routinely via Special:EmailUser
13700:41:47 <tgr> https://phabricator.wikimedia.org/T110283 is the rewrite task for CentralAuth although it's not very informative currently
13800:42:15 <tgr> it's a Q2 goal I believe - bd808 / bd808__ are those final now?
13900:42:24 * robla doesn't know exactly what our privacy policy says about email addresses as PII
14000:42:24 <Katie> Q2 means quarter two of what?
14100:42:50 <bd808__> by the end of the calendar year (Q2 of WMF fiscal)
14200:42:52 <tgr> sorry, that's Oct-Dec this year
14300:43:26 <Katie> MediaWiki won't be the, uhh, first software package to have login via e-mail address functionality. The laws of California really aren't relevant here unless the Wikimedia Foundation legal team comes forward and says so.
14400:44:26 <devunt> do we have to rewrite this login-via-email codes after AuthManager was implemented in core?
14500:44:36 <TimStarling> oh, I see we've lost csteipp
14600:44:37 <tgr> again, I wouldn't worry about spammers, I would worry about, say, some Chinese dissident being identified based on email address
14700:44:50 <tgr> but TimStarling is right that there are much easier attacks to make
14800:44:55 <Katie> MediaWiki's login usability is horrible. We need login via e-mail address and hopefully case-insensitive login one day. I think timing attacks and legal arguments are mostly silly distractions.
14900:45:03 <bd808__> I helped give voice to them, but I think the timing attack problem is mostly FUD
15000:45:27 <Katie> We can reasonably mitigate, but yeah.
15100:45:37 <TimStarling> so csteipp thinks he and legoktm should have input into the CA interface
15200:45:38 <bd808__> In practice account disclosure on a public wiki is not a huge problem
15300:45:55 <tgr> devunt: all authentication code has to be rewritten to adopt AuthManager
15400:46:04 <tgr> well, not so much rewritten as reorganized
15500:46:20 <devunt> and AuthManager is going to be implemented in this winter?
15600:46:42 <tgr> that's the plan I believe
15700:46:55 <robla> bd808__: good point about timing attack stuff. I just wanted to make sure we didn't glibly dismiss emails as public info, which it sounds like we aren't
15800:47:00 <TimStarling> so tgr, you are telling devunt to wait, not to implement and migrate?
15900:47:09 <Katie> I really, really don't want to see work here stall as the result of potential future goals.
16000:47:31 <TimStarling> because I think it is a pretty simple feature and can be done long before December
16100:47:50 <bd808__> AuthManager has been in progress for the last 9 months at various rates of speed. Anomie, Tgr and I will be focusing on it as our major body of work for the next 3 months. That should be enough to get it into production
16200:48:50 <bd808__> TimStarling: I'd say go ahead with it. we can catch authmanager up later
16300:49:08 <devunt> T30085 can be done in 2 weeks or before the ends of this month, I think.
16400:49:13 <tgr> TimStarling: not necessarily, but he should look at the AM project and check it's not done in a way that's going to be hard to port
16500:49:16 <bd808__> but in theory things like this will be easier to accomplish in the world where authmanager exists
16600:49:51 <Katie> In the future, everything will be simpler and better, yes. Today, it'd be really nice to have working login via e-mail address. :-)
16700:49:58 <TimStarling> well, it sounds like login by email should be included in the AM design as soon as possible
16800:50:32 <tgr> the design is very generic and definitely allows for it
16900:51:00 <tgr> it basically makes it the responsibility of the auth plugin what kind of fields they would like to manage
17000:51:14 <TimStarling> that is an action item for someone
17100:51:23 <bd808__> in the AM model it would be up to each primary provider (eg CA) to handle the input. It breaks things more cleanly than the core code does today
17200:51:53 <TimStarling> devunt's patch will have frontend stuff, like changing form labels and error messages, which you will need anyway
17300:52:27 <devunt> And I can re-use the codes when migrating this feature into AM after AM is implmented
17400:52:54 <TimStarling> the part which needs to be rewritten should be <100 lines
17500:53:19 <devunt> thus I think writting codes now is not a waste of time, though
17600:53:38 <devunt> Am I wrong about this?
17700:53:55 <tgr> #action reading-infrastructure make sure email login is not a problem for T110278 and T110283
17800:54:12 <bd808__> devunt: no. I think you should cary on while you have the time and inclination to work on the feature
17900:54:53 <TimStarling> ok, anything else in the remaining 5 minutes?
18000:55:17 <TimStarling> I think we are pretty much all done
18100:55:26 <Katie> devunt: I'm really glad you're working on this. I think it will be a major improvement in MediaWiki's login usability.
18200:55:47 <robla> agree with Katie, this will be great!
18300:55:49 <devunt> we still have one major problem
18400:56:07 <devunt> mediawiki allows duplication of email address
18500:56:10 <devunt> how can I handle it?
18600:56:32 <TimStarling> I have an idea for this
18700:56:44 <legoktm> sorry, I'm late. /me reads up
18800:56:54 <tgr> rejecting multi-user emails is good for the first version IMO
18900:56:56 <TimStarling> presumably when the same email address is used, it is usually the same person
19000:57:02 <Katie> I agree with tgr.
19100:57:16 <TimStarling> and thus, in most cases, the same password
19200:57:32 <bd808__> legoktm: tl;dr csteipp volunteered you to help figure out CA intergration :)
19300:57:42 <TimStarling> so if there is a duplicate email, we can just pick an account randomly and validate the supplied password against it
19400:58:15 <TimStarling> if it matches, then we can report a user-friendly error message
19500:58:15 <devunt> the current implementation will only process if email address doesn't connected with multiple accounts
19600:58:51 <TimStarling> like "this email address is used by more than one account, please log in with your username"
19700:59:26 <legoktm> a good chunk of CA is just core copied into a hook that uses a different database
19800:59:31 <TimStarling> if the password doesn't match, then we need to give the generic WRONG_INPUT message which doesn't disclose the presence of any accounts
19900:59:31 <devunt> I think it is bad pattern in some ways
20000:59:36 <legoktm> I don't imagine it will be difficult, just the timing stuff
20100:59:58 <Katie> TimStarling: People may still complain about that type of error message violating user privacy.
20201:00:03 <devunt> choosing randomly is not a good idea to me
20301:00:17 <devunt> and I think it is not a good design pattern
20401:00:23 <legoktm> +1 on skipping the multiple users having the same email thing for now
20501:00:40 <Katie> We can just reject for now and figure out the harder cases later, right?
20601:00:45 <TimStarling> well, if two people share the same email address, they can reset each others' passwords if they like
20701:01:04 <TimStarling> they can't really keep anything private from each other
20801:01:23 <Katie> Hmmm, right. You said only if the password works. That seems fine.
20901:01:53 <Katie> We need a good test suite here. There are a lot of cases.
21001:01:53 <tgr> the only drawback is that if the password is not the same everywhere you get somewhat random behavior
21101:01:57 <TimStarling> yeah, if the password doesn't match, you have to assume it is an attacker scanning for valid email addresses
21201:02:37 <devunt> yes I was worried about what tgr mentioned
21301:02:45 <tgr> which I don't think is unusual e.g. people use the same email for their bot but set a different password
21401:02:52 <tgr> probably not a big deal though
21501:03:06 <legoktm> or people who use the same email and password for multiple bots >.>
21601:03:37 <devunt> I think we have a basicially 3 choices here
21701:04:03 <devunt> 1. Only accepts if there is only one email address in the database
21801:04:26 <devunt> 2. Checking the password of randomly selected account (as TimStarling mentioned)
21901:04:47 <devunt> 3. Checking passwords of all accounts that have the same email address
22001:04:51 <tgr> TimStarling's suggestion is still 1 just with an improved error message
22101:05:04 <TimStarling> we can continue this discussion on phabricator
22201:05:24 <TimStarling> I am fine with devunt's implementation (1) for now
22301:05:29 <devunt> tgr, oh, yes, it is.
22401:05:38 * robla ducks out
22501:05:54 <TimStarling> #endmeeting