asw-d-eqiad started failing on SNMP today at approximately 07:10 UTC. I noticed because of LibreNMS but it is also happening (intermittently) using snmpwalk from netmon1001 or even from the switch itself:
faidon@asw-d-eqiad> show snmp mib walk ascii 1.3.6.1.2.1 sysDescr.0 = Juniper Networks, Inc. ex4200-48t Ethernet Switch, kernel JUNOS 12.3R3.4, Build date: 2013-06-14 01:37:19 UTC Copyright (c) 1996-2013 Juniper Networks, Inc. sysObjectID.0 = jnxProductNameEX4200 sysUpTime.0 = 1037958 sysContact.0 sysName.0 = asw-d-eqiad sysLocation.0 = eqiad sysServices.0 = 6 Request failed: OID not increasing: sysServices.0 >= sysServices.0
It is likely is related to 4 SFP+ that were added yesterday to the switch and that were connected to the new LVS servers. These servers were in a reboot loop and these ports flapped many times over the course of the night.
Note that what usually follows sysServices.0 above is the IF-MIB, so this is definitely interfaces-related.
There are no alarms, related log messages or diagnostic messages that could explain this.
What I've tried so far:
- Disabling those ports (and actually powering those servers down)
- restart snmp multiple times
- Switching the master routing engine from FPC 5 to FPC 4
We're now trying:
- Removing those SFP+ from the switch entirely