Page MenuHomePhabricator

es1019 ipmi and mgmt unresponsive
Closed, ResolvedPublic


This is similar to what happened in parent task with es1019, namely icinga complains about ipmi not working and connecting to management via ssh doesn't work either:

ipmi_sdr_cache_open: /root/.freeipmi/sdr-cache/sdr-cache-es1019.localhost: internal IPMI error

Event Timeline

fgiunchedi created this task.

This is a slave, so if we need to reboot it, it should be doable.

Yeah it looks like it'll need a power drain like last time in parent task. cc ops-eqiad and @Cmjohnson for visibility

Change 417198 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-eqiad.php: Depool es1019 for maintenance

Change 417198 merged by jenkins-bot:
[operations/mediawiki-config@master] db-eqiad.php: Depool es1019 for maintenance

@Cmjohnson I saw your ping yesterday but I was already out for the day.
I have now depooled es1019, so let me know when you are around today so I can stop MySQL for you and power it off.

Mentioned in SAL (#wikimedia-operations) [2018-03-08T07:22:35Z] <marostegui@tin> Synchronized wmf-config/db-eqiad.php: Depool es1019 for maintenance - T187530 (duration: 01m 16s)

Cmjohnson claimed this task.

Reset the server, drained all power, removed power cables, held in power button for 10 secs. Restored everything and mgmt IF now works fine. Resolving