Today Icinga showed (UTC+2 timings):
06:25 <icinga-wm> PROBLEM - Host cp3032 is DOWN: PING CRITICAL - Packet loss = 100% 06:31 <icinga-wm> PROBLEM - IPsec on cp1068 is CRITICAL: Strongswan CRITICAL - ok: 42 not-conn: cp3032_v4, cp3032_v6 06:31 <icinga-wm> PROBLEM - IPsec on cp1055 is CRITICAL: Strongswan CRITICAL - ok: 42 not-conn: cp3032_v4, cp3032_v6 06:31 <icinga-wm> PROBLEM - IPsec on cp2023 is CRITICAL: Strongswan CRITICAL - ok: 54 not-conn: cp3032_v4, cp3032_v6 06:31 <icinga-wm> PROBLEM - IPsec on cp2016 is CRITICAL: Strongswan CRITICAL - ok: 54 not-conn: cp3032_v4, cp3032_v6 06:31 <icinga-wm> PROBLEM - IPsec on cp1053 is CRITICAL: Strongswan CRITICAL - ok: 42 not-conn: cp3032_v4, cp3032_v6 06:31 <icinga-wm> PROBLEM - IPsec on cp1066 is CRITICAL: Strongswan CRITICAL - ok: 42 not-conn: cp3032_v4, cp3032_v6 06:31 <icinga-wm> PROBLEM - IPsec on cp2019 is CRITICAL: Strongswan CRITICAL - ok: 54 not-conn: cp3032_v4, cp3032_v6 06:31 <icinga-wm> PROBLEM - IPsec on cp2010 is CRITICAL: Strongswan CRITICAL - ok: 54 not-conn: cp3032_v4, cp3032_v6 06:31 <icinga-wm> PROBLEM - IPsec on cp1067 is CRITICAL: Strongswan CRITICAL - ok: 42 not-conn: cp3032_v4, cp3032_v6 06:31 <icinga-wm> PROBLEM - IPsec on cp1065 is CRITICAL: Strongswan CRITICAL - ok: 42 not-conn: cp3032_v4, cp3032_v6 06:31 <icinga-wm> PROBLEM - IPsec on cp1052 is CRITICAL: Strongswan CRITICAL - ok: 42 not-conn: cp3032_v4, cp3032_v6 06:31 <icinga-wm> PROBLEM - IPsec on cp2007 is CRITICAL: Strongswan CRITICAL - ok: 54 not-conn: cp3032_v4, cp3032_v6 06:32 <icinga-wm> PROBLEM - IPsec on cp2001 is CRITICAL: Strongswan CRITICAL - ok: 54 not-conn: cp3032_v4, cp3032_v6 06:32 <icinga-wm> PROBLEM - IPsec on cp2004 is CRITICAL: Strongswan CRITICAL - ok: 54 not-conn: cp3032_v4, cp3032_v6 06:32 <icinga-wm> PROBLEM - IPsec on cp2013 is CRITICAL: Strongswan CRITICAL - ok: 54 not-conn: cp3032_v4, cp3032_v6 06:32 <icinga-wm> PROBLEM - IPsec on cp1054 is CRITICAL: Strongswan CRITICAL - ok: 42 not-conn: cp3032_v4, cp3032_v6 `
Host not reachable via ssh, but available via console. Several errors in the dmesg related to bnx2x:
[Thu Jun 1 04:21:35 2017] ------------[ cut here ]------------ [56/1921] [Thu Jun 1 04:21:35 2017] WARNING: CPU: 2 PID: 0 at /home/zumbi/linux-4.9.13/net/sched/sch_generic.c:316 dev_watchdog+0x220/0x230 [Thu Jun 1 04:21:35 2017] NETDEV WATCHDOG: eth0 (bnx2x): transmit queue 0 timed out [Thu Jun 1 04:21:35 2017] Modules linked in: tcp_bbr(E) sch_fq(E) binfmt_misc(E) esp6(E) xfrm6_mode_transport(E) hmac(E) drbg(E) ansi_cprng(E) cpufreq_conservative(E) seqiv(E) cpuf req_userspace(E) xfrm4_mode_transport(E) cpufreq_powersave(E) 8021q(E) garp(E) mrp(E) stp(E) llc(E) xfrm_user(E) xfrm4_tunnel(E) tunnel4(E) ipcomp(E) xfrm_ipcomp(E) esp4(E) ah4(E) a f_key(E) xfrm_algo(E) intel_rapl(E) sb_edac(E) edac_core(E) x86_pkg_temp_thermal(E) intel_powerclamp(E) coretemp(E) kvm(E) mgag200(E) ttm(E) irqbypass(E) crct10dif_pclmul(E) drm_kms _helper(E) crc32_pclmul(E) ipmi_watchdog(E) iTCO_wdt(E) ghash_clmulni_intel(E) intel_cstate(E) iTCO_vendor_support(E) drm(E) evdev(E) dcdbas(E) i2c_algo_bit(E) lpc_ich(E) mei_me(E) intel_rapl_perf(E) pcspkr(E) mfd_core(E) mei(E) shpchp(E) wmi(E) tpm_tis(E) tpm_tis_core(E) tpm(E) [Thu Jun 1 04:21:35 2017] acpi_power_meter(E) button(E) ipmi_si(E) ipmi_poweroff(E) ipmi_devintf(E) ipmi_msghandler(E) autofs4(E) ext4(E) crc16(E) jbd2(E) fscrypto(E) mbcache(E) r aid1(E) md_mod(E) sg(E) sd_mod(E) ahci(E) ehci_pci(E) libahci(E) ehci_hcd(E) libata(E) bnx2x(E) aesni_intel(E) aes_x86_64(E) ptp(E) glue_helper(E) pps_core(E) lrw(E) mdio(E) gf128mu l(E) ablk_helper(E) libcrc32c(E) cryptd(E) usbcore(E) crc32c_generic(E) scsi_mod(E) usb_common(E) crc32c_intel(E) fjes(E) [Thu Jun 1 04:21:35 2017] CPU: 2 PID: 0 Comm: swapper/2 Tainted: G E 4.9.0-0.bpo.2-amd64 #1 Debian 4.9.13-1~bpo8+1 [Thu Jun 1 04:21:35 2017] Hardware name: Dell Inc. PowerEdge R630/0CNCJW, BIOS 1.0.4 08/28/2014 [Thu Jun 1 04:21:35 2017] 0000000000000000 ffffffff97529cd5 ffff9f47bf843e38 0000000000000000 [Thu Jun 1 04:21:35 2017] ffffffff972778a4 0000000000000000 ffff9f47bf843e90 ffff9f47ac47c000 [Thu Jun 1 04:21:35 2017] 0000000000000002 ffff9f47b1267100 000000000000005b ffffffff9727791f [Thu Jun 1 04:21:35 2017] Call Trace: [Thu Jun 1 04:21:35 2017] <IRQ> [Thu Jun 1 04:21:35 2017] [<ffffffff97529cd5>] ? dump_stack+0x5c/0x77 [Thu Jun 1 04:21:35 2017] [<ffffffff972778a4>] ? __warn+0xc4/0xe0 [Thu Jun 1 04:21:35 2017] [<ffffffff9727791f>] ? warn_slowpath_fmt+0x5f/0x80 [Thu Jun 1 04:21:35 2017] [<ffffffff97720d70>] ? dev_watchdog+0x220/0x230 [Thu Jun 1 04:21:35 2017] [<ffffffff97720b50>] ? dev_deactivate_queue.constprop.27+0x60/0x60 [Thu Jun 1 04:21:35 2017] [<ffffffff972e6240>] ? call_timer_fn+0x30/0x130 [Thu Jun 1 04:21:35 2017] [<ffffffff972e782c>] ? run_timer_softirq+0x1dc/0x440 [Thu Jun 1 04:21:35 2017] [<ffffffff972f6c80>] ? tick_sched_handle.isra.13+0x20/0x50 [Thu Jun 1 04:21:35 2017] [<ffffffff972f72a8>] ? tick_sched_timer+0x38/0x70 [Thu Jun 1 04:21:35 2017] [<ffffffff977fdf26>] ? __do_softirq+0x106/0x292 [Thu Jun 1 04:21:35 2017] [<ffffffff9727db28>] ? irq_exit+0x98/0xa0 [Thu Jun 1 04:21:35 2017] [<ffffffff977fdd2e>] ? smp_apic_timer_interrupt+0x3e/0x50 [Thu Jun 1 04:21:35 2017] [<ffffffff977fd042>] ? apic_timer_interrupt+0x82/0x90 [Thu Jun 1 04:21:35 2017] <EOI> [Thu Jun 1 04:21:35 2017] [<ffffffff976c2153>] ? cpuidle_enter_state+0x113/0x260 [Thu Jun 1 04:21:35 2017] [<ffffffff972bbfce>] ? cpu_startup_entry+0x17e/0x260 [Thu Jun 1 04:21:35 2017] [<ffffffff9724846d>] ? start_secondary+0x14d/0x190 [Thu Jun 1 04:21:35 2017] ---[ end trace db9d931b0691cee2 ]--- [Thu Jun 1 04:21:35 2017] bnx2x: [bnx2x_stats_comp:205(eth0)]timeout waiting for stats finished [Thu Jun 1 04:21:35 2017] bnx2x: [bnx2x_stats_comp:205(eth0)]timeout waiting for stats finished [Thu Jun 1 04:21:37 2017] bnx2x: [bnx2x_clean_tx_queue:1205(eth0)]timeout waiting for queue[0]: txdata->tx_pkt_prod(16172) != txdata->tx_pkt_cons(16171) [Thu Jun 1 04:21:39 2017] bnx2x: [bnx2x_clean_tx_queue:1205(eth0)]timeout waiting for queue[1]: txdata->tx_pkt_prod(25551) != txdata->tx_pkt_cons(25334) [Thu Jun 1 04:21:41 2017] bnx2x: [bnx2x_clean_tx_queue:1205(eth0)]timeout waiting for queue[2]: txdata->tx_pkt_prod(65069) != txdata->tx_pkt_cons(64844) [..]