Page MenuHomePhabricator

cp5006 multiple alerts (and SSH flapping)
Closed, ResolvedPublic

Description

cp5006 had multiple service warnings plus SSH flapping starting June 26 ~04:00 UTC

login to mgmt console not possible as well

Event Timeline

JMeybohm created this task.Fri, Jun 26, 7:43 AM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptFri, Jun 26, 7:43 AM

Mentioned in SAL (#wikimedia-operations) [2020-06-26T07:44:12Z] <volans> force rebooted cp5006 that is unresponsive (after having depooled it) - T256449

Host is back up, the console output during boot was all borked ��fx怘�怘�xx��x�x.... but the kernel boot logs were normally readable. Maybe there is some misconfiguration in the console redirection?

Nothing in syslog since 04:40:01 this morning.

Unrelated, during boot there is a rather spammy log repeated tons of times:

Jun 26 07:46:14 cp5006 atsmtail-backend[1136]: 2020/06/26 07:46:14 Unable to read from socket: dial unix /var/run/trafficserver/notpurge.sock: connect: no such file or directory
$ grep -c ' Unable to read from socket: dial unix' syslog
20659

Might be worth investigating.

ema closed this task as Resolved.Fri, Jun 26, 10:29 AM
ema claimed this task.
ema added a subscriber: ema.

The host looks fine, closing for now.