It'd be useful to know if there's more writing than normal to this log... Similar for general logs/errors
Related incident: https://wikitech.wikimedia.org/wiki/Incident_documentation/20161112-OurMine
Reedy | |
Nov 8 2016, 9:56 PM |
F4735703: render.png | |
Nov 17 2016, 6:00 PM |
It'd be useful to know if there's more writing than normal to this log... Similar for general logs/errors
Related incident: https://wikitech.wikimedia.org/wiki/Incident_documentation/20161112-OurMine
Subject | Repo | Branch | Lines +/- | |
---|---|---|---|---|
Move auth logging to different channels for easier counting | operations/mediawiki-config | master | +33 -38 |
If we wanted to try this plugin out, I think we would want to setup a new kibana instance somewhere. The current logstash.wikimedia.org kibana is actually 3 backend servers running behind an LB. This raises 2 problems: you don't know which of the 3 you are getting round-robbined to; if they share state via the elasticsearch cluster (which looks like how things are stored) then you would potentially get 3 alerts for each watch that fired.
We could (also) export the number of lines written to badpass to graphite and setup an icinga alert. The metric would be public though and so will the alert, I don't think it would be particularly troubling.
@bd808 looks good! I guess the regular icinga/graphite check could be used in this case then
I had a few minutes so I looked at this because it would be super swell to have it rigged up. It's a bit complicated at the moment.
Note T123243: Ability to alert when we get a sudden increase in bad passwords for privileged accounts and T193769: Thousands of failed login attempts (wrong password) are closely related.
This is still present...
logstash.rate.mediawiki.badpass.INFO.count
but in looking at badpass.log I noticed content that seems to be indicative of successful authentication rather than failed. So that set me wondering if that graph is a sane representation of what we would want to monitor. In talking with @Reedy a bit I tracked it back to {T150554} (which was declined eventually but changes were associated with it)
https://noc.wikimedia.org/conf/highlight.php?file=CommonSettings.php
// T150554 log successful attempts too $wgHooks['AuthManagerLoginAuthenticateAudit'][] = function ( $response, $user, $username ) { if ( $response->status === \MediaWiki\Auth\AuthenticationResponse::PASS ) { global $wgRequest; $headers = function_exists( 'apache_request_headers' ) ? apache_request_headers() : []; $privGroups = wfGetPrivilegedGroups( $username, $user ); $logger = LoggerFactory::getInstance( 'badpass' ); $logger->info( 'Login succeeded for {priv} {name} from {ip} - {xff} - {ua} - {geocookie}', [ 'successful' => true, 'groups' => implode( ', ', $privGroups ), 'priv' => count( $privGroups ) ? 'elevated' : 'normal', 'name' => $user->getName(), 'ip' => $wgRequest->getIP(), 'xff' => @$headers['X-Forwarded-For'], 'ua' => @$headers['User-Agent'], 'geocookie' => $wgRequest->getCookie( 'GeoIP', '' ), ] ); } };
https://phabricator.wikimedia.org/rOMWCa3eb73714cd332ee2139dade020f3aa88bab8c76
Authored by Tgr on Nov 12 2016, 1:26 PM.
Log successful login attempts for a while
Includes https://gerrit.wikimedia.org/r/#/c/321114/ too
https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/321114 abanoned in favor of
https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/321926/
It seems like this was due to the need to quantify logins historically.
(another note as indicated on task is that centralauth does record failed logins but the logging is somewhat sparse https://logstash.wikimedia.org/app/kibana#/dashboard/default?_g=h@e0234f6&_a=h@fcff38f)
So we'll need to unwind this a bit to make sure badpass.log is only recording events we want to be associated with failure to authenticate, and potentially shift the successful logins to another file or discontinue. Since it's been happening for the last few years I suspect we should just keep it as it's a fairly useful thing that is added adhoc as needed anyway.
Change 464077 had a related patch set uploaded (by Gergő Tisza; owner: Gergő Tisza):
[operations/mediawiki-config@master] Move auth logging to different channels for easier counting
Change 464077 merged by jenkins-bot:
[operations/mediawiki-config@master] Move auth logging to different channels for easier counting
Mentioned in SAL (#wikimedia-operations) [2018-11-01T00:05:47Z] <tgr@deploy1001> Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:464077|Move auth logging to different channels for easier counting (T150300, T123243)]] (duration: 00m 53s)
Mentioned in SAL (#wikimedia-operations) [2018-11-01T00:07:13Z] <tgr@deploy1001> Synchronized wmf-config/CommonSettings.php: SWAT: [[gerrit:464077|Move auth logging to different channels for easier counting (T150300, T123243)]] (duration: 00m 53s)
The Security-Team are the ostensible drivers of this work, but we have no resources or plans to work on it, so I'll mark it declined for now.