Page MenuHomePhabricator

cp2017 froze and stopped serving traffic
Closed, ResolvedPublic

Description

Dates in UTC:

19:32  <icinga-wm> PROBLEM - Host cp2017 is DOWN: PING CRITICAL - Packet loss = 100%
19:36  <icinga-wm> PROBLEM - IPsec on cp3049 is CRITICAL: Strongswan CRITICAL - ok: 52 connecting: cp2017_v4, cp2017_v6
19:36  <icinga-wm> PROBLEM - IPsec on cp4014 is CRITICAL: Strongswan CRITICAL - ok: 52 connecting: cp2017_v4, cp2017_v6
[..]

After a powercycle the host came up (frozen in the console too), but I preferred to depool it (sudo -i depool from cp2017) for a quick investigation. Once back it was serving traffic correctly as far as I could see.

Event Timeline

Nothing particularly interesting in kern.log except perhaps for the hrtimer message? Host powercycled at 19:55:

Feb 24 06:25:06 cp2017 kernel: [1346616.572311] Process accounting resumed
Feb 25 06:25:06 cp2017 kernel: [1433015.851722] Process accounting resumed
Feb 25 16:11:04 cp2017 kernel: [1468173.051697] hrtimer: interrupt took 629511 ns
Feb 25 19:55:26 cp2017 kernel: [    0.000000] Initializing cgroup subsys cpuset
Feb 25 19:55:26 cp2017 kernel: [    0.000000] Initializing cgroup subsys cpu
Feb 25 19:55:26 cp2017 kernel: [    0.000000] Initializing cgroup subsys cpuacct
Feb 25 19:55:26 cp2017 kernel: [    0.000000] Linux version 4.4.0-3-amd64 (debian-kernel@lists.debian.org) (gcc version 4.9.2 (Debian 4.9.2-10) ) #1 SMP Debian 4.4.2-3+wmf8 (2016-12-22)
ema triaged this task as Medium priority.Feb 26 2017, 2:06 PM
BBlack claimed this task.
BBlack subscribed.

No recurrence AFAIK, closing.