Page MenuHomePhabricator

mw1323 stuck after reboot
Closed, ResolvedPublic

Description

Host mw1323 ( eqiad row C / C6) is stuck after reboot. The host did not recover properly from reboot (no ssh or network connectivity on main interface). The hosts is responding on mw1323.mgmt.eqiad.wmnet. A racadm serveraction powercycle did not help to properly reboot the server.

racadm lists some critical errors in 2020:

/admin1-> racadm getsel
Record:      1
Date/Time:   03/30/2017 13:44:09
Source:      system
Severity:    Ok
Description: Log cleared.
-------------------------------------------------------------------------------
Record:      2
Date/Time:   09/16/2020 13:09:30
Source:      system
Severity:    Critical
Description: Power supply redundancy is lost.
-------------------------------------------------------------------------------
Record:      3
Date/Time:   09/16/2020 13:09:30
Source:      system
Severity:    Critical
Description: The power input for power supply 2 is lost.
-------------------------------------------------------------------------------
Record:      4
Date/Time:   09/16/2020 13:21:10
Source:      system
Severity:    Ok
Description: The input power for power supply 2 has been restored.
-------------------------------------------------------------------------------
Record:      5
Date/Time:   09/16/2020 13:21:20
Source:      system
Severity:    Ok
Description: The power supplies are redundant.
-------------------------------------------------------------------------------
Record:      6
Date/Time:   09/16/2020 13:35:05
Source:      system
Severity:    Critical
Description: Power supply redundancy is lost.
-------------------------------------------------------------------------------
Record:      7
Date/Time:   09/16/2020 13:35:05
Source:      system
Severity:    Critical
Description: The power input for power supply 1 is lost.
-------------------------------------------------------------------------------
Record:      8
Date/Time:   09/16/2020 13:42:45
Source:      system
Severity:    Ok
Description: The input power for power supply 1 has been restored.
-------------------------------------------------------------------------------
Record:      9
Date/Time:   09/16/2020 13:42:55
Source:      system
Severity:    Ok
Description: The power supplies are redundant.
-------------------------------------------------------------------------------

The host is depooled, you can work on this any time you want to.

Event Timeline

Cmjohnson claimed this task.
Cmjohnson subscribed.

did a hard power reset, server came back okay. no hardware issues were found