Over at T83978 we discovered that asw-ulsfo had an outstanding chassis alarm for about 4 months now. This is just an example of a recurring issue that happens due to the lack of monitoring for our Juniper router/switches.
We should create Icinga checks (or something equivalent) for:
- "show chassis alarms"
- (critical) BGP peerings
- critical interfaces being down (e.g. all router interfaces)
- VRRP
- virtual-chassis NotPrsnt (or similar)
- BFD sessions
- OSPF/OSPFv3 sessions