Page MenuHomePhabricator

db2135 (C6) lost power supply redundancy
Closed, ResolvedPublic

Description

@Papaul can you check the status of db2135 power supplies. It lives in C6 and we just got this alert:

<+icinga-wm> PROBLEM - IPMI Sensor Status on db2135 is CRITICAL: Sensor Type(s) Temperature, Power_Supply Status: Critical [PS Redundancy = Critical, Status = Critical] https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Power_Supply_Failures

Thanks!

Event Timeline

Task similar to T314559, I wonder if just a loose cable or the power supplies died.

Mentioned in SAL (#wikimedia-operations) [2022-08-05T13:27:09Z] <ladsgroup@cumin1001> dbctl commit (dc=all): 'Depool hosts with fragile power supply (T314559 T314628)', diff saved to https://phabricator.wikimedia.org/P32292 and previous config saved to /var/cache/conftool/dbconfig/20220805-132709-ladsgroup.json

This is complete

Change 822310 had a related patch set uploaded (by Jcrespo; author: Jcrespo):

[operations/puppet@production] mariadb: Reenable notifications for db2135

https://gerrit.wikimedia.org/r/822310

Change 822310 merged by Jcrespo:

[operations/puppet@production] mariadb: Reenable notifications for db2135

https://gerrit.wikimedia.org/r/822310