Page MenuHomePhabricator

nagios monitor transit/peering links and alert on low/high traffic
Closed, ResolvedPublic

Description

Author: lcarr

Description:

nagios monitor transit/peering links and alert on low/no traffic in all
locations

Details

Reference
rt1948

Event Timeline

rtimport raised the priority of this task from to Medium.Dec 18 2014, 1:03 AM
rtimport added a project: netops.
rtimport set Reference to rt1948.

Subject changed from 'nagios monitor transit/peering links and alert on low/no traffic' to 'nagios monitor transit/peering links and alert on low/high traffic' by lcarr

Status changed from 'new' to 'open' by lcarr

lcarr wrote:

Also on high traffic

Dependency by ticket #6775 added by gage

Dzahn changed the visibility from "WMF-NDA (Project)" to "Public (No Login Required)".Feb 24 2015, 11:41 PM
Dzahn changed the edit policy from "WMF-NDA (Project)" to "All Users".
ayounsi added a subscriber: faidon.

Not sure if still relevant seeing how old the ticket is, but:

High traffic: already being monitored by LibreNMS

Low traffic: hard to monitor and maintain as the baseline would need to either be manually set, or have something that analyses the historical values to set it automatically

No traffic: Can be easily monitored using LibreNMS
For that we would need to:

  1. Remove (or fix) the 2 interfaces that are tagged with Transit/Peering but never saw traffic
    • cr1-ulsfo.wikimedia.org:xe-0/0/3.98
    • xe-5/2/3.40
  2. Tag the OOB interfaces with something different than Transit (could be Transit-OOB, Uplink-OOB, etc..), as they are not "revenue" ports, and have significantly less bandwidth than standard uplinks
  3. Add LibreNMS alert to match a ifOutOctets_rate of 5M or something small enough to alert if there is only control traffic on the interface and no more proper traffic

@faidon does that sound reasonable to you?

Yes, it does. Thanks for working on such an old task!

For (1) of your list:

  • cr1-ulsfo.wikimedia.org:xe-0/0/3.98: this doesn't seem to be configured right now, so not sure why you mention it? IIRC, it used to be a transit port, GTT's, so maybe it's still stuck like that in LibreNMS?
  • xe-5/2/3.40: in which router? Could be the same situation as above.

Change 348941 had a related patch set uploaded (by Ayounsi):
[operations/puppet@production] LibreNMS macro for T133852 and T80273

https://gerrit.wikimedia.org/r/348941

Change 348941 merged by Ayounsi:
[operations/puppet@production] LibreNMS macro for T133852 and T80273

https://gerrit.wikimedia.org/r/348941

As we have some link with 0.2% of outbound traffic, I added a LibreNMS rule to alert if any traffic or peering link drops under 0.1% of outbound traffic.

Will monitor and update if needed.