Page MenuHomePhabricator

elastic1031 - PSU status critical
Closed, ResolvedPublic

Description

Icinga says that on elastic1031 the power supply status became CRITical about 8 days ago.

There is also the parent task to replace these servers (linked) that says it can start but if this is related to PSUs that might be separate. And it's just this single host it seems.


CRITICAL
(for 8d 2h 57m 23s)
Status Information: Sensor Type(s) Temperature, Power_Supply Status: Critical [PS Redundancy = Critical, Status = Critical, Status = Critical]
Performance Data: 'Inlet Temp'=23.00;3.00:42.00;-7.00:47.00 'Temp'=68.00 'Temp'=61.00

https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=elastic1031&service=IPMI+Sensor+Status

Event Timeline

Dzahn created this task.Jul 31 2019, 5:44 PM
wiki_willy added subscribers: Jclark-ctr, wiki_willy.

@Jclark-ctr - whenever you have a few min free, can you see if this is just a loose cable that maybe got accidentally pulled from the PDU swap last week? If it's actually a bad PSU, I think we can leave it, since it's due to be refreshed via T221636.

Thanks,
Willy

inspected elastic1031 both PSU green inspected cables verified fully seated into recently replaced PDU. no physical faults found

Gehel added a comment.Aug 5 2019, 3:26 PM

If it's actually a bad PSU, I think we can leave it, since it's due to be refreshed via T221636.

Confirmed, we can live without this node for a few weeks if it dies, so no need to spend time on diagnosis.

Cmjohnson closed this task as Resolved.Aug 7 2019, 2:39 PM
Cmjohnson added a subscriber: Cmjohnson.

I will resolve this task for now....if it becomes critical please open again.

Dzahn removed a subscriber: Dzahn.Aug 7 2019, 4:01 PM