Page MenuHomePhabricator

labsdb1009 boot issues (power supply and controller?)
Closed, ResolvedPublic

Description

I restarted labsdb1009- and it went directly to PXE, and reimage. I hurried to cancel that, but I do not think the reimage went through, as the disk failed to be recognized.

When it restarted, IT gave the following errors, which would explain why it is rebooting into network:

1623-Power Supply Failure - Power Supply 2 is failed. 
Action: Replace the power supply.
Disk Slot 1  HP Smart Array P840 Controller         (4 GB, v3.56)  0 Logical Drives
**** Boot Logical Drive is configured but is Missing or Offline.

I believe these 2 are issues beyond a mere boot order- but feel free to reconfirm (could the controller be powering along the controller?). I have left the server on boot config to prevent it from an infinite loop of reboots or reimages.

Event Timeline

Restricted Application added subscribers: Southparkfan, Aklapper. · View Herald Transcript

-Confirmed power supply is not working, reseated and still not working. HP support request needs to be submitted.

@Cmjohnson did HP come back to you about this issue?
Thanks!

HP Support Case Opened.

Case ID: 5315048494
Case title:
Failed Power Supply
Severity 3-Normal

Replaced the PSU, return shipment tracking is 1ZW0948Y9081215654

Sadly, it still doesn't allow to boot from the disk device, and when going to to the hp raid configuration utility it says:

error: no such device: HPEZCD240.

I am a bit lost here. :-(

I was confused by that message too @jcrespo, though it is sufficient to wait for the underlying linux to fully boot. You'll be dropped into hpssacli after that message. (This is also specified in https://wikitech.wikimedia.org/wiki/Platform-specific_documentation/HP_DL3N0_Gen9 but hard to find)

Thanks, @fgiunchedi, but I have not advanced much:

=> controller all show

Smart Array P840 in Slot 1                (sn: PDNNF0ARH9P117)

=> controller slot=1 pd all show status

Error: The specified controller does not have any physical drives on it.

=> controller slot=1 rescan            
=> controller slot=1 pd all show status

Error: The specified controller does not have any physical drives on it.

=> controller slot=1 ssdphysicaldrive all show config

Error: The specified controller does not have any SSD physical drives on it.

There use to be drives with data here.

Compare with the equivalent, well-working, labsdb1010:

=> controller slot=1 pd all show status

   physicaldrive 1I:1:1 (port 1I:box 1:bay 1, 1600.3 GB): OK
   physicaldrive 1I:1:2 (port 1I:box 1:bay 2, 1600.3 GB): OK
   physicaldrive 1I:1:3 (port 1I:box 1:bay 3, 1600.3 GB): OK
   physicaldrive 1I:1:4 (port 1I:box 1:bay 4, 1600.3 GB): OK
   physicaldrive 1I:1:5 (port 1I:box 1:bay 5, 1600.3 GB): OK
   physicaldrive 1I:1:6 (port 1I:box 1:bay 6, 1600.3 GB): OK
   physicaldrive 1I:1:7 (port 1I:box 1:bay 7, 1600.3 GB): OK
   physicaldrive 1I:1:8 (port 1I:box 1:bay 8, 1600.3 GB): OK
   physicaldrive 1I:1:9 (port 1I:box 1:bay 9, 1600.3 GB): OK
   physicaldrive 1I:1:10 (port 1I:box 1:bay 10, 1600.3 GB): OK
   physicaldrive 1I:1:11 (port 1I:box 1:bay 11, 1600.3 GB): OK
   physicaldrive 1I:1:12 (port 1I:box 1:bay 12, 1600.3 GB): OK
   physicaldrive 1I:1:13 (port 1I:box 1:bay 13, 1600.3 GB): OK
   physicaldrive 1I:1:14 (port 1I:box 1:bay 14, 1600.3 GB): OK
   physicaldrive 1I:1:15 (port 1I:box 1:bay 15, 1600.3 GB): OK
   physicaldrive 1I:1:16 (port 1I:box 1:bay 16, 1600.3 GB): OK

@jcrespo I re-seated all the components to the raid controller and powered on, all disks are now showing as 1 LD and booted to the OS You may want to do some stress testing prior to placing back in service.

Cmjohnson claimed this task.

@jcrespo please re-open if problem persists.