Page MenuHomePhabricator

Furud has flapping most of the day
Closed, ResolvedPublic

Description

https://ganglia.wikimedia.org/latest/?c=Miscellaneous%20codfw&h=furud.codfw.wmnet&m=cpu_report&r=hour&s=descending&hc=4&mc=2

[10:57:51] Start of my client window
[10:58:18] <icinga-wm> PROBLEM - SSH on furud is CRITICAL: Server answer
[11:00:29] <icinga-wm> RECOVERY - SSH on furud is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u2 (protocol 2.0)
[11:46:40] <icinga-wm> PROBLEM - SSH on furud is CRITICAL: Server answer
[12:01:38] <icinga-wm> RECOVERY - SSH on furud is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u2 (protocol 2.0)
[12:16:08] <icinga-wm> PROBLEM - SSH on furud is CRITICAL: Server answer
[12:24:19] <icinga-wm> RECOVERY - SSH on furud is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u2 (protocol 2.0)
[12:45:10] <icinga-wm> PROBLEM - SSH on furud is CRITICAL: Server answer
[12:49:18] <icinga-wm> RECOVERY - SSH on furud is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u2 (protocol 2.0)
[12:57:19] <icinga-wm> PROBLEM - SSH on furud is CRITICAL: Server answer
[12:59:28] <icinga-wm> RECOVERY - SSH on furud is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u2 (protocol 2.0)
[13:09:39] <icinga-wm> PROBLEM - SSH on furud is CRITICAL: Server answer
[13:15:30] <icinga-wm> RECOVERY - SSH on furud is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u2 (protocol 2.0)
[13:47:39] <icinga-wm> PROBLEM - SSH on furud is CRITICAL: Server answer
[13:51:19] <icinga-wm> RECOVERY - SSH on furud is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u2 (protocol 2.0)
[13:56:59] <icinga-wm> PROBLEM - SSH on furud is CRITICAL: Server answer
[14:12:18] <icinga-wm> RECOVERY - SSH on furud is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u2 (protocol 2.0)
[14:17:49] <icinga-wm> PROBLEM - SSH on furud is CRITICAL: Server answer
[14:21:38] <icinga-wm> RECOVERY - SSH on furud is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u2 (protocol 2.0)
[14:27:18] <icinga-wm> PROBLEM - SSH on furud is CRITICAL: Server answer
[14:29:08] <icinga-wm> RECOVERY - SSH on furud is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u2 (protocol 2.0)
[14:40:48] <icinga-wm> PROBLEM - SSH on furud is CRITICAL: Server answer
[14:44:28] <icinga-wm> RECOVERY - SSH on furud is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u2 (protocol 2.0)
[14:55:48] <icinga-wm> PROBLEM - SSH on furud is CRITICAL: Server answer
[15:09:08] <icinga-wm> RECOVERY - SSH on furud is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u2 (protocol 2.0)
[15:31:49] <icinga-wm> PROBLEM - SSH on furud is CRITICAL: Server answer
[15:33:48] <icinga-wm> RECOVERY - SSH on furud is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u2 (protocol 2.0)
[15:39:28] <icinga-wm> PROBLEM - SSH on furud is CRITICAL: Server answer
[15:41:18] <icinga-wm> RECOVERY - SSH on furud is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u2 (protocol 2.0)
[15:48:48] <icinga-wm> PROBLEM - SSH on furud is CRITICAL: Server answer
[16:00:10] <icinga-wm> RECOVERY - SSH on furud is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u2 (protocol 2.0)
[16:05:59] <icinga-wm> PROBLEM - SSH on furud is CRITICAL: Server answer
[16:28:29] <icinga-wm> RECOVERY - SSH on furud is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u2 (protocol 2.0)
[16:51:38] <icinga-wm> PROBLEM - SSH on furud is CRITICAL: Server answer
[17:02:12] <icinga-wm> RECOVERY - SSH on furud is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u2 (protocol 2.0)
[17:09:01] <icinga-wm> PROBLEM - SSH on furud is CRITICAL: Server answer
[17:25:52] <icinga-wm> RECOVERY - SSH on furud is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u2 (protocol 2.0)
[17:32:28] <icinga-wm> PROBLEM - SSH on furud is CRITICAL: Server answer
[17:54:27] <icinga-wm> RECOVERY - SSH on furud is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u2 (protocol 2.0)
[18:03:44] <icinga-wm> PROBLEM - SSH on furud is CRITICAL: Server answer
[18:13:04] <icinga-wm> RECOVERY - SSH on furud is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u2 (protocol 2.0)
[18:18:35] <icinga-wm> PROBLEM - SSH on furud is CRITICAL: Server answer
[18:33:22] <icinga-wm> RECOVERY - SSH on furud is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u2 (protocol 2.0)
[18:38:40] <icinga-wm> PROBLEM - SSH on furud is CRITICAL: Server answer
[18:55:51] <icinga-wm> RECOVERY - SSH on furud is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u2 (protocol 2.0)
[19:02:17] <icinga-wm> PROBLEM - SSH on furud is CRITICAL: Server answer
[19:03:36] <icinga-wm> RECOVERY - SSH on furud is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u2 (protocol 2.0)

Event Timeline

Peachey88 claimed this task.

[20:58:41] <godog> !log reboot furud.codfw.wmnet, ganeti instance with increasing load and 100% iowait, kvm/ganeti idle instance bug likely T134098
[20:58:49] <morebots> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[20:58:57] <icinga-wm> RECOVERY - salt-minion processes on furud is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion
[20:59:16] <icinga-wm> RECOVERY - Check size of conntrack table on furud is OK: OK: nf_conntrack is 0 % full
[20:59:36] <icinga-wm> RECOVERY - DPKG on furud is OK: All packages OK
[20:59:46] <icinga-wm> RECOVERY - RAID on furud is OK: OK: no RAID installed
[20:59:57] <icinga-wm> RECOVERY - SSH on furud is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u2 (protocol 2.0)
[21:00:36] <icinga-wm> RECOVERY - puppet last run on furud is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[21:17:22] <icinga-wm> RECOVERY - NTP on furud is OK: NTP OK: Offset 0.001899242401 secs

Peachey88 assigned this task to fgiunchedi.