Page MenuHomePhabricator

hw troubleshooting: IPMI Power Supply Failure (PS2) for wdqs2003.codfw.wmnet
Closed, ResolvedPublicRequest

Description

  • FQDN: wdqs2003.codfw.wmnet
  • Status: Depooled
  • Put in failed state in netbox https://netbox.wikimedia.org/dcim/devices/151/
  • Urgency: High (we don't have all too many wdqs servers so we'd like to minimize how long we have to keep the host depooled for)
  • Issue:

Sensor Type(s) Temperature, Power_Supply Status: Critical [Power Supply 2 = Critical, Power Supplies = Critical]
Host is running fine, but with only one PSU, so it lacks PSU redundancy.

  • Assigned Papaul for ops-codfw

Logs

Two relevant records:

</>hpiLO->  show /system1/log1/record4

status=0
status_tag=COMMAND COMPLETED
Wed Jan 12 21:09:57 2022


/system1/log1/record4
  Targets
  Properties
    number=4
    severity=Caution
    date=12/13/2021
    time=13:15
    description=System Power Supply: Input Power Loss or Unplugged Power Cord, Verify Power Supply Input (Power Supply 2)
  Verbs
    cd version exit show
</>hpiLO->  show /system1/log1/record5

status=0
status_tag=COMMAND COMPLETED
Wed Jan 12 21:09:53 2022



/system1/log1/record5
  Targets
  Properties
    number=5
    severity=Caution
    date=12/13/2021
    time=13:15
    description=System Power Supplies Not Redundant
  Verbs
    cd version exit show

Event Timeline

RKemper updated the task description. (Show Details)
RKemper updated the task description. (Show Details)
RKemper updated the task description. (Show Details)
RKemper renamed this task from hw troubleshooting: IPMI Power Supply Failure for wdqs2003.codfw.wmnet to hw troubleshooting: IPMI Power Supply Failure (PS2) for wdqs2003.codfw.wmnet.Wed, Jan 12, 9:17 PM

Mentioned in SAL (#wikimedia-operations) [2022-01-12T21:19:24Z] <ryankemper> [WDQS] T299098 depooled wdqs2003 so dc-ops can take a look at the PS2 failure