Page MenuHomePhabricator

db1169 reimage/idrac failure
Closed, ResolvedPublic

Description

I am trying to reimage db1169 to Bullseye, but it fails during the reboot.
Tried to connect to the idrac to see the host boot up and try to identify what is failing but it doesn't show anything, although I am able to poweroff and on the host via idrac (and see how the power changes from off to on but I am unable to see the boot process from the idrac).
The network (or the installer) isn't coming up as I am unable to ping the host from anywhere.

I have sent also a cold restart to the idrac and a hardreset on the host, but after coming back up, same behaviour. I am unable to see what's going on on its boot up and it doesn't reach up the debian installer (or a normal boot up) so it is getting stuck somewhere.

Can someone please take a look on-site to see what is being shown there and/or drain the host and idrac to see if the behaviour changes and I can see what is going on?

Event Timeline

Marostegui created this task.

I am setting this to high as this is a live s1 host and that we need to test Bullseye there to make sure we are ready for it so we can confirm that T297913: Confirm support of PERC 750 raid controller would be unblocked if we can go ahead and order the hosts and install them directly with Bullseye.

wiki_willy added a subscriber: Jclark-ctr.

Hi @Cmjohnson - just a heads up, this one is high priority. Thanks, Willy

@Marostegui at first glance the settings are correct but it's definitely stuck in a weird boot process. I am updating firmware first and will go from there

Thank you @Cmjohnson - once it is able to boot up, I can take it from there and attempt a reimage.

@Marostegui The server was hung up during POST in the memory collection process, I ended up removing all the DIMM"s with the exception of A1 and B1 and the server booted properly, updated BIOS, and then started adding the DIMIM back two at a time for each CPU. The server boots fine now, I also want to add no hardware errors have been detected. If the problems return please let me know.

Thanks Chris, going to try a reimage then! I will let you know how it goes

@Cmjohnson the host got reimaged fine. Thank you for fixing this so fast!