Page MenuHomePhabricator

Ability to alert when we get a sudden increase in bad passwords for privileged accounts
Open, MediumPublic

Description

Login failures are stored in various places. We should be able to alert when the number of failures suddenly increases, as we would typically see for password brute forcing.

Failed password attempts for privileged accounts are logged in elastic search. Yelp uses elastic search and elastalert (https://github.com/yelp/elastalert) to detect brute forcing, we could do similar.

In response to the alert, we can start with alerting the security team / ops. If the alerts look reliable, we can add alerting for the account being brute forced. If that appears to reliably detect brute-forcing, we could in the future automatically block the IP from logging in for a short period of time.

Event Timeline

csteipp raised the priority of this task from to Needs Triage.
csteipp updated the task description. (Show Details)
csteipp added a project: Security-Team.
csteipp added a subscriber: csteipp.

Change 464077 had a related patch set uploaded (by Gergő Tisza; owner: Gergő Tisza):
[operations/mediawiki-config@master] Move auth logging to different channels for easier counting

https://gerrit.wikimedia.org/r/464077

Change 464077 merged by jenkins-bot:
[operations/mediawiki-config@master] Move auth logging to different channels for easier counting

https://gerrit.wikimedia.org/r/464077

Mentioned in SAL (#wikimedia-operations) [2018-11-01T00:05:47Z] <tgr@deploy1001> Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:464077|Move auth logging to different channels for easier counting (T150300, T123243)]] (duration: 00m 53s)

Mentioned in SAL (#wikimedia-operations) [2018-11-01T00:07:13Z] <tgr@deploy1001> Synchronized wmf-config/CommonSettings.php: SWAT: [[gerrit:464077|Move auth logging to different channels for easier counting (T150300, T123243)]] (duration: 00m 53s)

chasemp renamed this task from Ability to alert when we get a sudden increase in bad passwords for privileged accounts, to possibly detect password brute-forcing to Ability to alert when we get a sudden increase in bad passwords for privileged accounts.Dec 20 2018, 8:46 PM
chasemp edited projects, added User-chasemp; removed Patch-For-Review.
chasemp triaged this task as Medium priority.Dec 9 2019, 5:22 PM

We have prometheus-es-exporter available that will turn the result of ES queries into Prometheus metrics. Alertmanager can easily turn these metrics into alerts.

Is there a Kibana dashboard or saved search we could reference?

Not that I know. The relevant searches are type:mediawiki AND channel:badpass (all failed login attempts) and type:mediawiki AND channel:badpass-priv (failed login attempts into admin and similar accounts).
(Except the first search will match both channel names, not sure what's the right syntax there.)

(There's also goodpass / goodpass-priv if you want to make it a ratio instead of an absolute number.)

There's also a login throttle hit dashboard, though probably less useful as a careful attacker could spread out his attempts and avoid being throttled.

Thanks @Tgr

I'm assuming -priv suffix means privileged accounts.

Looking at the data, it seems msgname.keyword:wrongpassword AND channel.keyword:badpass-priv gets us a histogram of bad password attempts for privileged accounts minute by minute. The data is pretty inconsistent though. Getting an alarm threshold that is useful and actionable would be difficult for me to find.

Could someone who wants these alerts could have a look and identify a sensible alert threshold? I'm not sure we can break it down much further given PII and metrics carnality concerns.

@sbassett any thoughts? This is a very old task, not sure how relevant it is to the Security team's current thinking.

The dashboard does show some apparent attacks (e.g.) but as long as it's just some troll manually fooling around, that probably shouldn't be alarmed on. So maybe growth of bad password volume by a magnitude or two?

sbassett added a subscriber: Dsharpe.

@colewhite @Tgr -

Thanks for the ping on this. Given the age and dormancy of this task, I'll re-triage it for our team's clinic next Monday. I think @Dsharpe might have more insight into what functionality, if any, is currently desired for this variety of monitoring, as he is closer to the Security-Team's incident response and mitigation policies. I know there was some related work in T213933, which was eventually declined in preference of a potential, alternative approach.

Anyhow, my more personal thoughts are that these types of things can be very difficult to monitor in any meaningful sense, especially given an environment like Wikimedia production, where there are issues regarding both large volumes of data and large volumes of noise. Determining various thresholds and rates for what might constitute an actual event of concern can be more art than science. That being said, there may very well be some value in monitoring the -priv channels as mentioned above, so the Security-Team can re-evaluate this and hopefully provide some guidance soon.

We want to investigate and deploy sound, actionable detection and alerting around identity in general, but I am not sure alerting on spikes on 100% failed login attempts will get us very far down that road.

I know this is asking a lot, but if we had some way to add on some detection around the most privileged accounts to detect higher risk behavior or maybe major deviations from normal activity from a particular account (e.g. a login from a place the account owner would never log in from), that would be useful.

I know this is asking a lot, but if we had some way to add on some detection around the most privileged accounts to detect higher risk behavior or maybe major deviations from normal activity from a particular account (e.g. a login from a place the account owner would never log in from), that would be useful.

IIUC, it sounds like you're asking for Anomaly Detection and/or IDS-like capability. Observability can feed data into a system that does it, but we don't have this capability ourselves at this time.

We want to investigate and deploy sound, actionable detection and alerting around identity in general, but I am not sure alerting on spikes on 100% failed login attempts will get us very far down that road.

By "100% failed" do you mean throttled? An alarm on badpass-priv volume would alert on spikes of any kind of failed password-based login attempts.
I think that could be useful for detecting mass dictionary or "pwned passwords" attempts. Most of those would be detected anyway, by LoginNotify and the notified users reaching out, but an attacker could be sneaky about it, and make a large number of attempts, each to a different user.

I know this is asking a lot, but if we had some way to add on some detection around the most privileged accounts to detect higher risk behavior

badpass-priv is already reasonably privileged (admins and higher). Having something for even higher privileges (ifadmin/checkuser/oversighter/steward I assume?) would not be too hard either. If you mean privileged actions as opposed to login attempts, I don't think that can be detected by volume - an attack will not result of an unusually high number of, say, permission changes or JS page edits, the attacker only needs to do one or two.

or maybe major deviations from normal activity from a particular account (e.g. a login from a place the account owner would never log in from)

MediaWiki does not track login locations (the SecureSessions extension does that, but it's long unmaintained + was never deployed on Wikimedia), so it would have to be updated (which seems nontrivial) or that would have to happen in some external system that aggregates location information from logs. (IP logging to logstash should probably be fixed before that; currently most logged events send the reverse proxy's IP, not the actual client IP.)