Page MenuHomePhabricator

cloudbackup2001 lockup on 2023-05-05
Closed, ResolvedPublic

Description

Cloudbackup stopped responding to nrpe and ssh today. When I tried to connect to the console via mgmt it was too slow to actually display a password prompt.

Post-reboot, I don't see anything very interesting in syslog but I suspect this is a side-effect of prometheus being extra busy due to T335943.

Note for future incindents: There's an LVM bug on this host that prevents the /srv/cinder-backups from mounting on reboot... this causes boot to drop into emergency mode. The volume shows as 'LV Status NOT available' which can be resolved with "lvchange -ay /dev/backup/cinder-backups". This seems to be an upstream bug (https://access.redhat.com/solutions/4497071) which I'm trying to ignore since this hardware will be refreshed in a few months.

Event Timeline

@Andrew I went back through the lifecycle logs on the idrac and I could not find a cause for the ssh going down or the lagged response. I inspected the mgmt cable and it does not appear to be degraded. Is there anything I can do physically to help with this issue?

@Jhancock.wm can you please check what is the IDRAC firmware version and what is the latest one on Dell website. Thanks.

idrac firmware on the host is version 3.32
Dell's latest version is 6.10
I will start the upgrade process on this. good call and thank you @Papaul

@Andrew this is now resolve. Please let us know if you still have issues. Thanks