icinga notification if elevated writing to badpass.log
Open, NormalPublic

Description

It'd be useful to know if there's more writing than normal to this log... Similar for general logs/errors

Reedy created this task.Nov 8 2016, 9:56 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptNov 8 2016, 9:56 PM
Tgr added a subscriber: bd808.Nov 17 2016, 5:17 AM

@bd808 pointed to the Kibana watcher plugin: https://github.com/elasticfence/kaae

bd808 added a comment.Nov 17 2016, 4:44 PM

@bd808 pointed to the Kibana watcher plugin: https://github.com/elasticfence/kaae

If we wanted to try this plugin out, I think we would want to setup a new kibana instance somewhere. The current logstash.wikimedia.org kibana is actually 3 backend servers running behind an LB. This raises 2 problems: you don't know which of the 3 you are getting round-robbined to; if they share state via the elasticsearch cluster (which looks like how things are stored) then you would potentially get 3 alerts for each watch that fired.

We could (also) export the number of lines written to badpass to graphite and setup an icinga alert. The metric would be public though and so will the alert, I don't think it would be particularly troubling.

bd808 added a comment.Nov 17 2016, 6:00 PM

We could (also) export the number of lines written to badpass to graphite and setup an icinga alert. The metric would be public though and so will the alert, I don't think it would be particularly troubling.

https://graphite.wikimedia.org/render?from=-2hours&until=now&width=400&height=250&target=logstash.rate.mediawiki.badpass.INFO.count&_uniq=0.9259289370548927&title=logstash.rate.mediawiki.badpass.INFO.count

@bd808 looks good! I guess the regular icinga/graphite check could be used in this case then

fgiunchedi triaged this task as Normal priority.Nov 30 2016, 2:10 AM
chasemp added a subscriber: chasemp.May 4 2018, 6:49 PM
chasemp added a comment.EditedTue, Oct 2, 12:50 PM

I had a few minutes so I looked at this because it would be super swell to have it rigged up. It's a bit complicated at the moment.

Note T123243: Ability to alert when we get a sudden increase in bad passwords for privileged accounts, to possibly detect password brute-forcing and T193769: Thousands of failed login attempts (wrong password) are closely related.

This is still present...

logstash.rate.mediawiki.badpass.INFO.count

https://graphite.wikimedia.org/render?from=-2hours&until=now&width=800&height=500&target=logstash.rate.mediawiki.badpass.INFO.count&_uniq=0.9259289370548927&title=logstash.rate.mediawiki.badpass.INFO.count&from=-72h

but in looking at badpass.log I noticed content that seems to be indicative of successful authentication rather than failed. So that set me wondering if that graph is a sane representation of what we would want to monitor. In talking with @Reedy a bit I tracked it back to {T150554} (which was declined eventually but changes were associated with it)

https://noc.wikimedia.org/conf/highlight.php?file=CommonSettings.php

// T150554 log successful attempts too
$wgHooks['AuthManagerLoginAuthenticateAudit'][] = function ( $response, $user, $username ) {
	if ( $response->status === \MediaWiki\Auth\AuthenticationResponse::PASS ) {
		global $wgRequest;
		$headers = function_exists( 'apache_request_headers' ) ? apache_request_headers() : [];

		$privGroups = wfGetPrivilegedGroups( $username, $user );
		$logger = LoggerFactory::getInstance( 'badpass' );
		$logger->info( 'Login succeeded for {priv} {name} from {ip} - {xff} - {ua} - {geocookie}', [
			'successful' => true,
			'groups' => implode( ', ', $privGroups ),
			'priv' => count( $privGroups ) ? 'elevated' : 'normal',
			'name' => $user->getName(),
			'ip' => $wgRequest->getIP(),
			'xff' => @$headers['X-Forwarded-For'],
			'ua' => @$headers['User-Agent'],
			'geocookie' => $wgRequest->getCookie( 'GeoIP', '' ),
		] );
	}
};

https://phabricator.wikimedia.org/source/mediawiki-config/browse/master/wmf-config/CommonSettings.php

https://phabricator.wikimedia.org/rOMWCa3eb73714cd332ee2139dade020f3aa88bab8c76

Authored by Tgr on Nov 12 2016, 1:26 PM.
Log successful login attempts for a while
Includes https://gerrit.wikimedia.org/r/#/c/321114/ too

https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/321114 abanoned in favor of
https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/321926/

It seems like this was due to the need to quantify logins historically.

(another note as indicated on task is that centralauth does record failed logins but the logging is somewhat sparse https://logstash.wikimedia.org/app/kibana#/dashboard/default?_g=h@e0234f6&_a=h@fcff38f)

So we'll need to unwind this a bit to make sure badpass.log is only recording events we want to be associated with failure to authenticate, and potentially shift the successful logins to another file or discontinue. Since it's been happening for the last few years I suspect we should just keep it as it's a fairly useful thing that is added adhoc as needed anyway.

Change 464077 had a related patch set uploaded (by Gergő Tisza; owner: Gergő Tisza):
[operations/mediawiki-config@master] Move auth logging to different channels for easier counting

https://gerrit.wikimedia.org/r/464077

Change 464077 had a related patch set uploaded (by Gergő Tisza; owner: Gergő Tisza):
[operations/mediawiki-config@master] Move auth logging to different channels for easier counting

https://gerrit.wikimedia.org/r/464077

Thanks @Tgr!