While cloudsw1-b1-codfw was rebooting for upgrade I noticed that the BFD check in Icinga failed. The SNMP poll of the device failed, as expected, but the check script did not handle it correctly:
Traceback (most recent call last): File "/usr/lib/nagios/plugins/check_bfd.py", line 65, in <module> main() File "/usr/lib/nagios/plugins/check_bfd.py", line 37, in main for index in snimpyManager.bfdSessState: File "/usr/lib/python3/dist-packages/snimpy/manager.py", line 426, in __iter__ for k, _ in self.iteritems(): File "/usr/lib/python3/dist-packages/snimpy/manager.py", line 451, in iteritems for noid, result in self.session.walk(oid): File "/usr/lib/python3/dist-packages/snimpy/manager.py", line 127, in walk return self.getorwalk("walkmore", *args) File "/usr/lib/python3/dist-packages/snimpy/manager.py", line 112, in getorwalk value = getattr(self._session, op)(*args) File "/usr/lib/python3/dist-packages/snimpy/snmp.py", line 311, in walkmore return self._op(self._cmdgen.bulkCmd, *args) File "/usr/lib/python3/dist-packages/snimpy/snmp.py", line 267, in _op raise SNMPException(str(errorIndication)) snimpy.snmp.SNMPException: No SNMP response received before timeout
Think we need to add exception handling for snimpy.snmp.SNMPException. This seems to be common for all our SNMP-based checks so I'll try to address them all.