LoginNotify should inform users of the IP address of failed login attempts to their account
Open, HighPublic15 Story Points

Description

When someone tries to reset our password, be it ourselves or third parties, the IP address of the requestor of the password reset is sent to our inbox with the password reset email. LoginNotify should do the same.

I don't think there would be major privacy concerns as long as its noted where appropriate that trying to login on an account may disclose private data to the owner of that account, if that's not already covered by https://wikimediafoundation.org/wiki/Privacy_policy#To_Protect_You.2C_Ourselves_.26_Others. This will need to be checked with WMF Legal.

Similarly, unsuccessful logins should leave CU traces to prevent abuse, otherwise this feature can become a source of annoyance.

New Notification Data Form

Filling out this form will help developers and product people understand your idea and will provide the information required to implement it. To see examples of the types of answers required, have a look at this sample form. To understand unfamiliar terms, visit the glossary. 

Basic information

  - Purpose of the notification:  To inform the user about failed login attempts to his account
  - Notification name:  Unchanged, reusing notification-known-header-login-fail notification from LoginNotify
  - What triggers notification?: Login attempts
  - "Notice" or "Alert"?:  Alert
  - Notification type (standard, bundled, expandable bundle):  standard, I think (unchanged from the existing notification)

Wording

For a single message

  • Header: Unchanged
  • Body:  Added a new body which reads "IP address of the last login attempt: $1" where $1 is replaced with the IP address

For Bundled Messages

  • Main, bundling message:
  • Subsidiary, bundled message:

Links

  - Primary link target: None added
  - Primary link label (for email display only): None added

  - #1 secondary link target: 
  - #1 secondary link label:

  - #2 secondary link target:
  - #2 secondary link label:

Icon

  - Icon name:  Unchanged
  - Link to graphic/example: Unchanged

Related Objects

There are a very large number of changes, so older changes are hidden. Show Older Changes
Huji added a comment.Dec 29 2017, 10:51 PM

... I am even more surprised that someone might be worried about issues of IP privacy of someone who willingly is trying to force illegally a system.

Because not all cases of failed login attempts are "willing attempts". What if there are two users Huji126 and Huji127, the latter being mine, and when I am trying to log into my own account, I accidentally type the other username? Should my IP be disclosed to that other user (of course they wouldn't know it was me, but we are still sharing an IP with them not due to a willing attempt on my part to hack into their account, but because of a negligent typo)?

This is what our Legal team at the WMF had to make a decision on. And they have made a decision that it is okay for the IP address of failed login attempts to be disclosed, as long as it is not retained (and therefore not accessible) for longer than the WMF retention policy.

Because not all cases of failed login attempts are "willing attempts".

Maybe, but how can you tell?

What if there are two users Huji126 and Huji127, the latter being mine, and when I am trying to log into my own account, I accidentally type the other username? Should my IP be disclosed to that other user?

Yes, it should.
No one would relate your username to that IP; further, the IP address is not amongst the sensitive information like i.e. religion, political party, sex orientation or such.
Thus there's no reason to not disclose it.

This is what our Legal team at the WMF had to make a decision on. And they have made a decision that it is okay for the IP address of failed login attempts to be disclosed, as long as it is not retained (and therefore not accessible) for longer than the WMF retention policy.

Still, I don't know what to do with such an useless tool then. They'd better spend their time to create useful tools rather than these annoying ones that don't give you any useful information. Either they provide the IP from which the attempt has been performed or retire that useless tool.

Huji added a comment.Dec 31 2017, 9:30 PM

@Blackcat I happen to agree with you. But neither your nor my opinion is not pervasive here. What matters is the Legal's opinion, which also seems to be in agreement with the both of us. There is no need to extend the discussion any further :)

... or retire that useless tool.

The tool is not entirely useless. It at least makes you aware of the attempts, and encourages you to use strong passwords. I bet people are more likely to change their password to something stronger if they know they have been targeted by failed login attempts, than if they just receive a general recommendation about strong passwords.

But I agree that the tool will be somewhat more useful if it shows you where the failed attempt was made from. It is still not tremendously useful just for I as a user to see the IP of the attacker (okay, I know the attempt was made from an IP that belongs to a wireless internet provider, such as AT&T; so what?) What would make it ultimately useful is the fact that CheckUsers would then be able to backtrack the data to possibly find the account behind it, or can setup AbuseFilters by IP range.

... or retire that useless tool.

The tool is not entirely useless. It at least makes you aware of the attempts, and encourages you to use strong passwords. I bet people are more likely to change their password to something stronger if they know they have been targeted by failed login attempts, than if they just receive a general recommendation about strong passwords.

I'd like to add that the extension also gives you a notification if your account got successfully logged into from a new IP. I like to think that is useful to some extent. :)

The tool is not entirely useless. It at least makes you aware of the attempts, and encourages you to use strong passwords.

Which I already do, having been working in the IT for 27 years. Thus is useless because who knows the matter doesn't need it, and who doesn't know doesn't understand the importance of a strong password.

I'd like to add that the extension also gives you a notification if your account got successfully logged into from a new IP. I like to think that is useful to some extent. :)

Sure, Niharika: It tells you that a thief has already got in your house and has stolen good. I would like to be warned when one attempts to get in.

I'd like to add that the extension also gives you a notification if your account got successfully logged into from a new IP. I like to think that is useful to some extent. :)

Sure, Niharika: It tells you that a thief has already got in your house and has stolen good. I would like to be warned when one attempts to get in.

It also recommends you change your locks so no more thieves can get in, in future. If it weren't for the extension, you won't even know someone entered your house, let alone they did something bad.

It also recommends you change your locks so no more thieves can get in, in future. If it weren't for the extension, you won't even know someone entered your house, let alone they did something bad.

I would notice no matter the extention. Again, I've been working into the IT for 27 years and don't need a reminder. A tool must tell me where an attack comes from, not to warn me to choose a strong password (which I already do). What is useful for such a tool? Only to annoy people?

@Blackcat, I feel like we've answered that question. The use of the tool is to provide people with information that someone has tried to access their account, so they can make sure they have a strong, unique password. Providing the IP -- which I believe Huji is working on -- will help people to know whether this was actually an attempt from an outsider, or a login that they already know about.

If the login notifications are annoying you, you can turn them off in preferences.

Hey @Huji, how's the work on this going? Let me know if you want any help on it.

Huji added a comment.Jan 3 2018, 12:17 AM

@kaldari help is always appreciated. Can you start by reviewing the patch for T183722 please? That is the key step for us to be able to test the code for this and for T174562

He7d3r added a subscriber: He7d3r.Jan 5 2018, 12:09 AM
jmatazzoni moved this task from Untriaged to Old Untriaged on the Collaboration-Team-Triage board.
jmatazzoni moved this task from Untriaged to Old Untriaged on the Collaboration-Team-Triage board.
jmatazzoni moved this task from Untriaged to Old Untriaged on the Collaboration-Team-Triage board.
jmatazzoni moved this task from Untriaged to Old Untriaged on the Collaboration-Team-Triage board.
jmatazzoni moved this task from Untriaged to Old Untriaged on the Collaboration-Team-Triage board.
jmatazzoni moved this task from Untriaged to Old Untriaged on the Collaboration-Team-Triage board.
jmatazzoni moved this task from Untriaged to Old Untriaged on the Collaboration-Team-Triage board.

@Aklapper, pay attention. It could be a bug in phabricator. I got a lot of identic letters.

@IKhitron: Feel free to file a separate task about things unrelated to the topic of this very task.

Scott added a subscriber: Scott.Jan 20 2018, 4:27 PM

@Huji, what's the status of this task? I saw you working on https://gerrit.wikimedia.org/r/#/c/376451/ and started wondering. This task only asks for the IP address and not the location. Now that that's getting logged in CU, it should be easy to expose it in the notification.

Also note that we don't have Legal/Privacy sign-off for doing Geolocation (but we do for IP address).

Huji added a comment.Mar 8 2018, 11:11 PM

@Niharika I am working on them in parallel. I have a broken patch (not on gerrit yet) for the Echo notification; the fact that Echo combines five failed attempts into one notification (so you get notified only at 5, 10, 15, 20, ... failed attempts) has made it a bit difficult because we can (rarely) have cases when 5 attempts are made from five different IP addresses. So I am working on a way to combine the IPs into one Echo message as well. My lack of familiarity with Echo is slowing me down, too.

@kaldari: correct.

@Niharika I am working on them in parallel. I have a broken patch (not on gerrit yet) for the Echo notification; the fact that Echo combines five failed attempts into one notification (so you get notified only at 5, 10, 15, 20, ... failed attempts) has made it a bit difficult because we can (rarely) have cases when 5 attempts are made from five different IP addresses. So I am working on a way to combine the IPs into one Echo message as well. My lack of familiarity with Echo is slowing me down, too.

We probably want to show them a range in that case and not all of the IPs individually. We can help brainstorm more on this if you'd like.

Huji added a comment.Mar 8 2018, 11:27 PM

So I did think about range. Here are situations that I can think of, in which multiple IPs might have failed attempts for the same account:

  • Malicious: a user is trying to gain access to my account, but to circumvent MediaWiki safeguards, they are using a botnet or something like that
  • Grey area: a user is trying to log in via TOR; every attempt may be routed differently in TOR, hence a different IP each time.
  • Non-malicious: a user tries and fails on their phone, thinks "may be I just can't type my password correctly on the little phone screen", switches to laptop and fails again; phone and laptop on two different networks (e.g. one on cellular, one on home wifi)
  • Non-malicious: a user tries and fails from their computer, tries again later and fails again but by then their IP has been changed by their ISP (rare)

In all except the last, the various IPs are most likely not from the same subnet, so grouping them into one range won't be practical. If a malicious user tries to log into my account from 8.4.4.8, 147.11.57.19 and 220.36.55.35 there is no way we can group them. And if he happens to try from 220.36.55.35 and 220.102.19.48 it doesn't make sense to combine them into 220.0.0.0/8 because those IPs don't even belong to the same ISP (even though they both originate in Japan).

Lastly, one motivation behind showing the actual IP is that if a user keeps getting failed attempts from the same IP, they can report to CUs so they can conduct more investigation on that IP. Turning them into ranges may not be a great idea in that regard either.

So I did think about range. Here are situations that I can think of, in which multiple IPs might have failed attempts for the same account:

  • Malicious: a user is trying to gain access to my account, but to circumvent MediaWiki safeguards, they are using a botnet or something like that
  • Grey area: a user is trying to log in via TOR; every attempt may be routed differently in TOR, hence a different IP each time.
  • Non-malicious: a user tries and fails on their phone, thinks "may be I just can't type my password correctly on the little phone screen", switches to laptop and fails again; phone and laptop on two different networks (e.g. one on cellular, one on home wifi)
  • Non-malicious: a user tries and fails from their computer, tries again later and fails again but by then their IP has been changed by their ISP (rare)

    In all except the last, the various IPs are most likely not from the same subnet, so grouping them into one range won't be practical. If a malicious user tries to log into my account from 8.4.4.8, 147.11.57.19 and 220.36.55.35 there is no way we can group them. And if he happens to try from 220.36.55.35 and 220.102.19.48 it doesn't make sense to combine them into 220.0.0.0/8 because those IPs don't even belong to the same ISP (even though they both originate in Japan).

    Lastly, one motivation behind showing the actual IP is that if a user keeps getting failed attempts from the same IP, they can report to CUs so they can conduct more investigation on that IP. Turning them into ranges may not be a great idea in that regard either.

Okay. In my opinion, the easiest way forward would be to show the IP if there is one attempt. If it's a bundled notification (i.e. multiple attempts), don't try to show IPs.

Change 419088 had a related patch set uploaded (by Huji; owner: Huji):
[mediawiki/extensions/LoginNotify@master] Show the IP address of the login attempt in the Echo notification

https://gerrit.wikimedia.org/r/419088

Huji added a comment.EditedMar 13 2018, 12:41 AM

Everything I tried failed and my code was messy, so I created a bare-bones patch at https://gerrit.wikimedia.org/r/#/c/419088/3/ and here I explain what is wrong with it:

  1. It should not use $wgRequest->getIP() because that would be the IP of the user who is seeing the notification (target user), not the IP address from which the failed attempt actually originated from. I am using it in the code just to demonstrate what the output would look like.
    1. Later in PS6, I try to pass the actual IP, but even that is bad practice because it means we will be retaining IP address in yet another table, especially a table that is much harder to purge (because IP will be serialized among other parameters). Hence the need for the parameter to be a cuc_id.
    2. I have to figure out how to pass the right parameter. The parameter to pass would be a cuc_id value referencing the row in cu_changes that would contain the IP address.
  2. As stated above, the parameter should be a cuc_id. I had originally planned for the cuc_id to be returned by the onAuthManagerLoginAuthenticateAudit function in CheckUser but @MaxSem talked me out of it in https://gerrit.wikimedia.org/r/#/c/408049/
    1. We need to determine why MaxSem did not agree with the callback function returning a value.
    2. Obviously, the code in my new patch should be expanded such that if no cuc_id is returned or if the returned ID is not found in cu_changes (for example because CU data was purged), the notification just doesn't show the message body.

With all those caveats in mind, this is how it will look like if you have chosen to be shown one notification per failed login attempt:

For batched notifications, I still have work to do. We can simplify it and only show the IP for the last attempt in a batch or something like that. But let's first figure out how to show the right IP in the right place.

Huji added a comment.EditedMar 13 2018, 1:11 AM

I recreated the path I took last time (and eventually failed at) in PS6 of the patch above. It doesn't look clean and actually doesn't show the IP either, but it is trying to pass the IP all the way through the wiring of LoginNotify. @Reedy, @MaxSem can I ask you to take a look please?

Huji added a comment.Mar 13 2018, 11:26 PM

Alright, I finally figured out how to handle the passing of parameters. Right now, what is passed is the IP address itself (which as we agreed above is bad).

As soon as I hear back from @MaxSem here about the cons of pros of the callback function in CheckUser returning a row ID, I will work on using that (instead of the IP) here.

Huji updated the task description. (Show Details)Mar 13 2018, 11:29 PM
Huji added a comment.EditedMar 14 2018, 1:31 AM

I think I know why @MaxSem recommended not returning a value from CheckUserHooks::onAuthManagerLoginAuthenticateAudit(). Both that and LoginNotify\Hooks::onAuthManagerLoginAuthenticateAudit() use the AuthManagerLoginAuthenticateAudit hook; therefore, not only we cannot guarantee that the CU callback function is run before the LN callback function, but also, there is no way to pass the output of the CU callback function (may it be a cu_id) to the LN function in order to use it for generating the Echo notification.

The patch I submitted is now fully functional (shows the IP address of the login attempt correctly); however, in its current form, it ends up storing the user's IP in the echo_event table as a serialized value in the event_extra column, like this example: a:3:{s:11:"notifyAgent";b:1;s:2:"ip";s:7:"100.200.103.206";s:5:"count";i:79;} (where 100.200.103.206 was the IP address).

This is problematic because there is no way to purge just the IP from this table. The only alternatives that come to my mind are:

  1. To write a purge script that completely removes the rows that have event_type of "login-fail-new".
  2. To add a new column to that table called event_extra_private which would only contain private information, and then purge that column.

Approach 2 is what AbuseFilter does (it has an afl_ip field which stores the IP address associated with an abuse log entry, and is purged regularly). I think it is best to have a serialized field and not a plain field like the case of AbuseFilter. The reason is we are envisioning a future in which we would want to show more private data than just the IP (hence T174553). So it is best to just have one field in which we store the IP, the GeoNames ID for the city/country, the ISP subnet, etc. and have a purge script that just does that.

I am going to stop here. @Niharika I know that you are eager for this work to be completed; so am I. But I think we are now facing an important data architecture question which is best answered by a more seasoned MediaWiki developer. To the best of my knowledge, the original plan of just storing a cu_id in the Echo tables is impossible, but I could be wrong. If I am not wrong, then we have to make a choice between 1 & 2, and if we choose 2, then we need to patch Echo (which I am happy to take a stab at, but I would prefer to be done by someone more familiar with Echo as well).

Gryllida added a subscriber: Gryllida.

@Huji: Echo notifications are meant to be transient, not permanent, and I think with LN notifications especially, there is no reason they should be retained indefinitely. It seems like the easiest solution would be to purge all LN notifications after 90 days (the entire notification, not just the event_extra column), possibly with the same clean-up mechanism that purges all notifications after 2000 (per user).

Huji added a comment.Apr 4 2018, 8:30 PM

I'm happy to quickly modify my patch to put the IP in event_extra as soon as a 90-day purging script is made for Echo and enabled on WMF.

This task would probably need to be added to includes/jobs/NotificationDeleteJob.php.

Huji added a comment.EditedApr 6 2018, 1:08 AM

@kaldari no. That job is only run when a notification is sent. Even though you might think it is safe to assume that there are so many notifications that the job will be called many times a minute, that is only correct for big projects (such as English Wikipedia). Smalller WMF projects (such as newly created wikis, the Ombudsmen Wiki, etc.) may not have even a single notification for several weeks, and this makes is theoretically possible for us to retain data beyond the retention period.

Therefore, we should do it the right way, which is create a new maintenance script in Notifications similar to https://phabricator.wikimedia.org/diffusion/ECHU/browse/master/maintenance/purgeOldData.php and schedule it to be run a regular basis (e.g. daily).

@Huji: Right you are.

TheDJ added a comment.Apr 6 2018, 11:33 AM

if we schedule and require something to run daily, we should probably have an internal error/whistleblower (possibly in NotificationDeleteJob.php) when we detect that such a cleanup script is NOT running.

Hmm, can we just pass cu_changes.cuc_id and load from Checkuser data, thus keeping all the private information in one place?

Huji added a comment.EditedApr 10 2018, 11:09 PM

@MaxSem please see T174388#4048541 in which I explained why that is not possible. Can I ask you to confirm my analysis is correct?

jrbs added a subscriber: jrbs.May 3 2018, 5:03 PM

I don't think there would be major privacy concerns

This assumes that all login attempts to the wrong username are malicious. I imagine a lot can be attributed to typos.

This could also be used to extract target users' IP addresses by deliberately registering multiple accounts which are common typos of a target usernames. The chance of success for single user might not be that high, but used against a large list of users it could be successful.

KTC added a subscriber: KTC.May 3 2018, 9:22 PM
Huji added a comment.May 3 2018, 10:36 PM

I don't think there would be major privacy concerns

This assumes that all login attempts to the wrong username are malicious. I imagine a lot can be attributed to typos.

This could also be used to extract target users' IP addresses by deliberately registering multiple accounts which are common typos of a target usernames. The chance of success for single user might not be that high, but used against a large list of users it could be successful.

But we already have a safeguard for that (through AntiSpoof): we don't allow one to create accounts with usernames too similar to an existing account.

AntiSpoof isn't foolproof though, e.g. it disallows Тhryduulf (the first letter is Cyrillic) but probably not Thryduuulf (too many 'u's) or Awkwrad42 (typo for my alt Awkward42).

Huji added a comment.May 3 2018, 10:41 PM

That is fair.

I wonder how other service providers (such as Facebook or Google) approach this, as we know they have had a similar feature for years.

I would definitely support this. We see it everywhere else, on edits, password resets and so on. So login attempts would be a natural progression. What about 2 Factor authentication for all Wikipedia users also?

jrbs added a comment.EditedMay 4 2018, 6:18 PM

I would definitely support this. We see it everywhere else, on edits, password resets and so on. So login attempts would be a natural progression. What about 2 Factor authentication for all Wikipedia users also?

Unfortunately there are scalability issues we need to iron out before we can do something like this. (e.g. when someone forgets their 2FA codes or loses their phone, there is no easy way to get them back without asking a developer to alter the database.)

I'm personally of the opinion that we should have 2FA for all users, but at present there is no failsafe if things go wrong with it.

chasemp added a subscriber: chasemp.May 4 2018, 6:46 PM
Cwek added a subscriber: Cwek.May 16 2018, 9:31 AM