Page MenuHomePhabricator

db1098 power redundancy lost
Closed, ResolvedPublic

Description

Status:

Individual Power Supply Elements
Health 	Name 	Status 	Input Wattage 	Output Wattage 	FW Version 	Part Number 	Type 	
PS1 Status	Present	900	750	00.24.43	08H33MA01	AC	
PS2 Status	Present | Input lost | Input lost or out-of-range	900	750	00.24.43	08H33MA01	AC

Lifecycle log:

2020-02-26T22:23:27-0600	RDU0012	Power supply redundancy is lost.
2020-02-26T22:23:01-0600	PSU0003	The power input for power supply 2 is lost.
2020-02-26T22:22:58-0600	PSU0800	Power Supply 2: Status = 0x284a, IOUT = 0x0, VOUT= 0x0, TEMP= 0x0, FAN = 0x0, INPUT= 0x8.
2020-02-26T22:22:58-0600	LOG007	The previous log entry was repeated 1 times.

Internal IPMI monitoring:

Sensor Type(s) Temperature, Power_Supply Status: Critical [PS Redundancy = Critical, Status = Critical]

Maybe a good excuse to also do: T216240

Event Timeline

jcrespo created this task.Feb 27 2020, 9:17 AM
Restricted Application added a project: Operations. · View Herald TranscriptFeb 27 2020, 9:17 AM

@wiki_willy This could be a power supply failure or other power connectivity issue, there is only so much we can check remotely. We need an onsite check. The server is depooled from production out of precaution, but has a running database replicating data, please ping us if/when maintenance is going to be done to properly shut it down.

Mentioned in SAL (#wikimedia-operations) [2020-02-27T09:35:51Z] <jynus> upgrade and restart db1084 T246323

wiki_willy moved this task from Backlog to Hardware Failure / Troubleshoot on the ops-eqiad board.
wiki_willy added a subscriber: wiki_willy.

@Jclark-ctr - can you check this out when you get in? Maybe a connection was knocked loose the other day. Thanks, Willy

Please ping me if it is not something as obvious as a cable and need it down to prepare the host.

Jclark-ctr closed this task as Resolved.Feb 27 2020, 6:18 PM

@jcrespo Reseated power cable Psu powered on closing ticket