Page MenuHomePhabricator

Phase monitoring for new PDUs
Closed, ResolvedPublic

Description

While working on T148541 I noticed that latest (i.e. those using the sentry4 SNMP MIB) PDUs installed in eqiad as part of T226778 are failing their phase monitoring checks, whereas ulsfo PDUs installed in T209101 are currently missing icinga phase monitoring checks (i.e. only ping checks)

Details

Related Gerrit Patches:

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJul 26 2019, 11:22 AM
faidon added a subscriber: faidon.Jul 26 2019, 11:35 AM

whereas ulsfo PDUs installed in T209101 are currently missing icinga phase monitoring checks (i.e. only ping checks)

Note that ulsfo does not have 3-phase power so it makes sense here to be different from eqiad/codfw. It would probably still make sense to have something to monitor that single phase, though.

herron triaged this task as Normal priority.Jul 26 2019, 4:26 PM
herron added a project: observability.
fgiunchedi moved this task from Backlog to Up next on the observability board.Aug 5 2019, 2:30 PM

Change 529790 had a related patch set uploaded (by Filippo Giunchedi; owner: Filippo Giunchedi):
[operations/puppet@production] facilities: introduce monitor_pdu_phase for ulsfo PDUs

https://gerrit.wikimedia.org/r/529790

Change 529791 had a related patch set uploaded (by Filippo Giunchedi; owner: Filippo Giunchedi):
[operations/puppet@production] prometheus: generate targets for single phase PDUs

https://gerrit.wikimedia.org/r/529791

Change 529790 merged by Filippo Giunchedi:
[operations/puppet@production] facilities: introduce monitor_pdu_phase for ulsfo PDUs

https://gerrit.wikimedia.org/r/529790

Change 529791 merged by Filippo Giunchedi:
[operations/puppet@production] prometheus: generate targets for single phase PDUs

https://gerrit.wikimedia.org/r/529791

Change 537646 had a related patch set uploaded (by Filippo Giunchedi; owner: Filippo Giunchedi):
[operations/puppet@production] facilities: support sentry4 for 3 phase monitoring

https://gerrit.wikimedia.org/r/537646

@RobH related to T148541: Replace Torrus with Prometheus snmp_exporter for PDUs monitoring, I took a stab at adjusting the OIDs for phase monitoring on 3 phase sentry4 PDUs in https://gerrit.wikimedia.org/r/537646 please take a look when you get a chance

Change 537646 merged by Filippo Giunchedi:
[operations/puppet@production] facilities: support sentry4 for 3 phase monitoring

https://gerrit.wikimedia.org/r/537646

Change 538161 had a related patch set uploaded (by Filippo Giunchedi; owner: Filippo Giunchedi):
[operations/puppet@production] facilities: add phase monitoring for single phase PDUs

https://gerrit.wikimedia.org/r/538161

fgiunchedi moved this task from Backlog to Doing on the User-fgiunchedi board.

Change 538161 merged by Filippo Giunchedi:
[operations/puppet@production] facilities: add phase monitoring for single phase PDUs

https://gerrit.wikimedia.org/r/538161

fgiunchedi closed this task as Resolved.Sep 23 2019, 8:41 AM
fgiunchedi claimed this task.

This is completed, we're monitoring the single phase in ulsfo now with same settings as codfw/eqiad!