Page MenuHomePhabricator

ps1-a7-eqiad power over threshold alerts
Closed, ResolvedPublic

Event Timeline

ayounsi created this task.
wiki_willy added a subscriber: elukey.

@Cmjohnson @elukey - just a heads up, this may put a wrench in moving one of the an-worker servers to A7. Let me see when the next time something in this rack is scheduled to be decom'd.

@wiki_willy I am pretty confident that the power spike is from the new ms-be1060.

I moved ms-be1060 to a different phase. I think we could add the server to A7 but it cannot be on the same phase as ms-be1060.

Ah cool, thanks @Cmjohnson

I moved ms-be1060 to a different phase. I think we could add the server to A7 but it cannot be on the same phase as ms-be1060.

Got another similar alert, see:
https://librenms.wikimedia.org/graphs/id=8980/type=sensor_power/from=1616136600/to=1616223000

It's barely touching the alerting threshold though.

It keeps alerting, I disabled alerting for that device until then.

Once fixed please re-enable it in https://librenms.wikimedia.org/device/43/edit

I'm going to reassign this over to @Jclark-ctr, since he's working on refreshing some mw servers, which will the @Dzahn and the Service-Ops team the ability to decom a few of the older mw servers out of this rack. Thanks, Willy

@wiki_willy, I'd imagine that once we start decoming the mw servers in the rack that the issue will self resolve. I do not think there is any need to keep this task open. Do you?

Hi @Cmjohnson - it's going to keep alerting, until the mw servers are decommissioned, so might as well leave it open until then. Thanks, Willy

the MW servers are out of the rack, will make sure to balance power better with new servers racked in A7

Re-opening as I noticed that alerting was still disabled for that device and the power briefly goes above threshold.
See https://librenms.wikimedia.org/device/device=43/tab=logs/section=eventlog/ and https://librenms.wikimedia.org/graphs/id=11444/type=sensor_power/from=1659098100/

it has been2 weeks with out any alerts closing ticket nothing else will be added to this rack untill we can decom some host from it.