Page MenuHomePhabricator

Juniper monitoring
Closed, ResolvedPublic

Description

Over at T83978 we discovered that asw-ulsfo had an outstanding chassis alarm for about 4 months now. This is just an example of a recurring issue that happens due to the lack of monitoring for our Juniper router/switches.

We should create Icinga checks (or something equivalent) for:

  • "show chassis alarms"
  • (critical) BGP peerings
  • critical interfaces being down (e.g. all router interfaces)
  • VRRP
  • virtual-chassis NotPrsnt (or similar)
  • BFD sessions
  • OSPF/OSPFv3 sessions

Event Timeline

rtimport raised the priority of this task from to Medium.Dec 18 2014, 1:56 AM
rtimport added a project: ops-core.
rtimport set Reference to rt7654.

Reference to ticket #7642 added by dzahn

http://exchange.nagios.org/directory/Plugins/Hardware/Network-Gear/Cisco/Check-various-hardware-environmental-sensors/details

Status changed from 'new' to 'open' by RT_System

faidon added a project: observability.
faidon changed the visibility from "WMF-NDA (Project)" to "Public (No Login Required)".
faidon changed the edit policy from "WMF-NDA (Project)" to "All Users".
faidon set Security to None.

Random possibility: logstash has a plugin to act as a SNMP trap receiver http://logstash.net/docs/1.4.2/inputs/snmptrap

For BGP peelings there must be a Nagios plugin handling it. IIRC there is a standard MIB that list peering remotes and their status.

For chassis alarm, Juniper has a private MIB mib-jnx-chassis. txt

You can poke Juniper Alarm MIB, the basic shell script http://exchange.nagios.org/directory/Plugins/Hardware/Network-Gear/Juniper/check_juniperalarm-2Esh/details look at the yellow/red alarm count. Could be a good start.

faidon updated the task description. (Show Details)

Change 281467 had a related patch set uploaded (by Faidon Liambotis):
Add check_jnx_alarms to check Juniper chassis alarms

https://gerrit.wikimedia.org/r/281467

Change 281467 merged by Faidon Liambotis:
Add check_jnx_alarms to check Juniper chassis alarms

https://gerrit.wikimedia.org/r/281467

Change 281495 had a related patch set uploaded (by Faidon Liambotis):
netops: monitor all asw/msw/psw as well

https://gerrit.wikimedia.org/r/281495

Change 281495 merged by Faidon Liambotis:
netops: monitor all asw/msw/psw as well

https://gerrit.wikimedia.org/r/281495

A new check has been added to LibreNMS to monitor "show system alarms" (yellow and red)
As well as all the moving parts (PSU/FAN/etc...)

Change 369710 had a related patch set uploaded (by Ayounsi; owner: Ayounsi):
[operations/puppet@production] Icinga: add check_bfd check (part 1)

https://gerrit.wikimedia.org/r/369710

Change 369710 merged by Ayounsi:
[operations/puppet@production] Icinga: add check_bfd check (part 1)

https://gerrit.wikimedia.org/r/369710

Change 370103 had a related patch set uploaded (by Ayounsi; owner: Ayounsi):
[operations/puppet@production] Icinga: add check_bfd check (part 1)

https://gerrit.wikimedia.org/r/370103

Change 370103 merged by Ayounsi:
[operations/puppet@production] Icinga: add check_bfd check (part 1)

https://gerrit.wikimedia.org/r/370103

Change 461498 had a related patch set uploaded (by Ayounsi; owner: Ayounsi):
[operations/puppet@production] Icinga, assign bfd check to routers

https://gerrit.wikimedia.org/r/461498

Change 461503 had a related patch set uploaded (by Ayounsi; owner: Ayounsi):
[operations/puppet@production] SNMP: set snmp-mibs-downloader BASEDIR to Debian 9 standard

https://gerrit.wikimedia.org/r/461503

Change 461503 merged by Ayounsi:
[operations/puppet@production] SNMP: set snmp-mibs-downloader BASEDIR to Debian 9 standard

https://gerrit.wikimedia.org/r/461503

Change 461498 merged by Ayounsi:
[operations/puppet@production] Icinga, assign bfd check to routers

https://gerrit.wikimedia.org/r/461498

Change 496873 had a related patch set uploaded (by Ayounsi; owner: Ayounsi):
[operations/puppet@production] Icinga: Add OSPF check to routers

https://gerrit.wikimedia.org/r/496873

Change 496873 had a related patch set uploaded (by Ayounsi; owner: Ayounsi):
[operations/puppet@production] Icinga: Add OSPF check to routers

https://gerrit.wikimedia.org/r/496873

Change 496873 merged by Ayounsi:
[operations/puppet@production] Icinga: Add OSPF check to routers

https://gerrit.wikimedia.org/r/496873