Page MenuHomePhabricator

Monitor all mgmt hosts
Closed, DuplicatePublic

Description

We currently do not monitor mgmt interfaces (iDRAC/iLO). This means that we can lose the management interface for whatever reason, not notice and then at some point lose the machine itself and get completely locked out. This has happened before.

We should monitor them -- it could be as easy as monitor::host { "${::hostname}.mgmt.{$::site}.wmnet": } (plus exceptions?), however I'm worrying that our Icinga setup won't really be able to scale up to 1000+ host checks.