Page MenuHomePhabricator

hw troubleshooting: Broken PSU on parse2004
Closed, ResolvedPublic


  • - Provide FQDN of system.
  • - If other than a hard drive issue, please depool the machine (and confirm that it’s been depooled) for us to work on it. If not, please provide time frame for us to take the machine down.
  • - Put system into a failed state in Netbox.
  • - Provide urgency of request, along with justification (redundancy, dependencies, etc)
  • - Describe issue and/or attach hardware failure log. (Refer to if you need help)
  • - Assign correct project tag and appropriate owner (based on above). Also, please ensure the service owners of the host(s) are added as subscribers to provide any additional input.
  • FQDN: parse2004.codfw.wmnet
  • Urgency: Medium
  • Out of warranty, but a PSU can maybe be salvaged from a decommed system?
Sensor Type(s) Temperature, Power_Supply Status: Critical [Status = Critical, PS Redundancy = Critical]

Nothing in racadm getsel or ipmi-sel, sel log seems to have been cleared, but ipmimonitoring Critical status remains


Other Assignee

Event Timeline

Mentioned in SAL (#wikimedia-operations) [2023-03-17T13:21:10Z] <claime> Depooling parse2004.codfw.wmnet for broken PSU - T332119

Clement_Goubert renamed this task from Broken PSU on parse2004 to hw troubleshooting: Broken PSU on parse2004.Fri, Mar 17, 1:34 PM
Clement_Goubert assigned this task to Papaul.
Clement_Goubert updated Other Assignee, added: Clement_Goubert.
Clement_Goubert added a project: DC-Ops.
Clement_Goubert updated the task description. (Show Details)
Clement_Goubert removed a project: SRE.
Clement_Goubert added a subscriber: serviceops.
Papaul moved this task from Backlog to Hardware Failure / Troubleshoot on the ops-codfw board.
Papaul added a subscriber: Papaul.
Jhancock.wm added a subscriber: Jhancock.wm.

the physical PSU is showing as up and the server does not have an amber warning light on. replaced PSU from decommed server. alert in iDRAC has cleared =D

Icinga checks for the psu's are all green .We can resolve the task.