Page MenuHomePhabricator

aqs1012: reseat SSD (/dev/sdh)?
Closed, ResolvedPublic


I'm in the midst of re-imaging aqs1012.eqiad.wmnet, and one of the hosts 8 block devices seems to have gone missing (/dev/sdh). I'm not seeing any errors in dmesg (or the web console, see below), it's just not there. I'm hoping that perhaps re-seating it might help.

The host has already been down for most of a day, and I would prefer not to extend for too much longer. If there isn't someone available to do this in the next few hours, I'll probably move on to Plan B instead.

image.png (848×708 px, 83 KB)

Edit: The server is powered down and ready.

Event Timeline

Eevans triaged this task as High priority.Nov 15 2023, 3:31 PM

Screenshot 2023-11-16 at 11.26.33 AM.png (572×722 px, 72 KB)
Reseated hard drives. update idrac and bios firmware

Screenshot 2023-11-16 at 11.26.33 AM.png (572×722 px, 72 KB)
Reseated hard drives. update idrac and bios firmware

I confirmed this to be the case before proceeding, but after restarting via the reimage cookbook, we're back to missing disk 4.

image.png (287×474 px, 25 KB)

Eevans assigned this task to Jclark-ctr.

This done; Thanks!