Page MenuHomePhabricator

audit all codfw pdu tower draws
Closed, InvalidPublic

Description

When troubleshooting T163339 with @Papaul, I was logged into https://ps1-a1-codfw.mgmt.codfw.wmnet and could see that tower A had a full load, while tower B had next to no load.

This is typically caused when the bios of the servers is improperly set for power redundancy, and puts the secondary psu into standby rather than active/active. It is non-ideal, as the PSU doesn't experience a full load, and it makes balancing power between the phases more difficult (as each system takes a larger amount of amperage on that tower).

I've created this task so I'll go back and audit all of the racks, and note which racks seem to have improper balance between towers A and B. We also may want to add monitoring for this, since we monitor for phase imbalance (between x/y/z phases in a tower) but not tower imbalance.

Event Timeline

There also may be a idrac/ipmi command to query how the power supply units are drawing in the systems. Need to check.

RobH removed RobH as the assignee of this task.

duplicate of T163339 ?

Yep!