vhctpd unexpectedly stopped running on cp1075 11 hours ago:
2019-12-15 20:44 <+icinga-wm> PROBLEM - Varnish HTCP daemon on cp1075 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 115 (vhtcpd), args vhtcpd https://wikitech.wikimedia.org/wiki/Varnish
The reason for this is a segmentation fault:
Dec 15 20:38:42 cp1075 kernel: vhtcpd[148095]: segfault at 0 ip 00007f17d82fc231 sp 00007fff7bddc828 error 4 in libc-2.24.so[7f17d81d3000+195000]
I have depooled the host at 07:52.
The unit logs do not show anything interesting, just that the process exited:
root@cp1075:~# systemctl status vhtcpd.service
● vhtcpd.service - LSB: vhtcpd
Loaded: loaded (/etc/init.d/vhtcpd; generated; vendor preset: enabled)
Active: active (exited) since Wed 2019-09-25 00:02:22 UTC; 2 months 21 days ago
Docs: man:systemd-sysv-generator(8)
Tasks: 0 (limit: 41779)
CGroup: /system.slice/vhtcpd.service
Dec 15 19:47:22 cp1075 vhtcpd[148095]: Purger1: input: 15701889632 failed: 32 q_size: 1030 q_mem: 191766 q_max_size: 11100 q_max_mem: 3623507
Dec 15 20:02:22 cp1075 vhtcpd[148095]: start: 1569369742 uptime: 7070399 purgers: 2 recvd: 15968398902 bad: 2 filtered: 0
Dec 15 20:02:22 cp1075 vhtcpd[148095]: Purger0: input: 15968398900 failed: 1355211 q_size: 264290641 q_mem: 40499198184 q_max_size: 264290641 q_max_mem: 40499198184
Dec 15 20:02:22 cp1075 vhtcpd[148095]: Purger1: input: 15702753047 failed: 32 q_size: 977 q_mem: 211345 q_max_size: 11100 q_max_mem: 3623507
Dec 15 20:17:22 cp1075 vhtcpd[148095]: start: 1569369742 uptime: 7071300 purgers: 2 recvd: 15970859344 bad: 2 filtered: 0
Dec 15 20:17:22 cp1075 vhtcpd[148095]: Purger0: input: 15970859342 failed: 1355211 q_size: 265942528 q_mem: 40675459219 q_max_size: 265942594 q_max_mem: 40675467672
Dec 15 20:17:22 cp1075 vhtcpd[148095]: Purger1: input: 15703561602 failed: 32 q_size: 1165 q_mem: 202575 q_max_size: 11100 q_max_mem: 3623507
Dec 15 20:32:22 cp1075 vhtcpd[148095]: start: 1569369742 uptime: 7072200 purgers: 2 recvd: 15973436580 bad: 2 filtered: 0
Dec 15 20:32:22 cp1075 vhtcpd[148095]: Purger0: input: 15973436578 failed: 1355211 q_size: 267722646 q_mem: 40879916804 q_max_size: 267722646 q_max_mem: 40879916804
Dec 15 20:32:22 cp1075 vhtcpd[148095]: Purger1: input: 15704358720 failed: 32 q_size: 1079 q_mem: 162098 q_max_size: 11100 q_max_mem: 3623507Note that (1) the status is active instead of inactive (or failed), and (2) the process did not get restarted by systemd. The unit specifically has Restart=no, so I assume this is intentional?