Page MenuHomePhabricator

codfw: ManagementSSHDown for ores2009 and thumbor2004
Closed, ResolvedPublic

Description

We are getting emails from sre-obervability that the mgmt interface of those server is down. Will it be possible to depool those servers if in production so we can take a look?
Thanks.

Event Timeline

Papaul triaged this task as High priority.Nov 28 2022, 3:51 PM

Icinga downtime and Alertmanager silence (ID=8b8e8a4d-71f2-462d-8e1f-ff904f7e3ed4) set by akosiaris@cumin1001 for 1:00:00 on 1 host(s) and their services with reason: work on iDrac

thumbor2004.codfw.wmnet

ores2009 is shutting down & powering off now

thunbor2004 had a broken IDRAC card. I replaced it.

ores2009 mgmt is back up