Page MenuHomePhabricator

es1019 ipmi and mgmt unresponsive
Closed, ResolvedPublic

Description

This is similar to what happened in parent task with es1019, namely icinga complains about ipmi not working and connecting to management via ssh doesn't work either:

ipmi_sdr_cache_open: /root/.freeipmi/sdr-cache/sdr-cache-es1019.localhost: internal IPMI error

Event Timeline

fgiunchedi triaged this task as Normal priority.Feb 16 2018, 11:22 AM
fgiunchedi created this task.

This is a slave, so if we need to reboot it, it should be doable.

Yeah it looks like it'll need a power drain like last time in parent task. cc ops-eqiad and @Cmjohnson for visibility

Cmjohnson moved this task from Backlog to Up next on the ops-eqiad board.Feb 16 2018, 3:47 PM

Change 417198 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-eqiad.php: Depool es1019 for maintenance

https://gerrit.wikimedia.org/r/417198

Change 417198 merged by jenkins-bot:
[operations/mediawiki-config@master] db-eqiad.php: Depool es1019 for maintenance

https://gerrit.wikimedia.org/r/417198

@Cmjohnson I saw your ping yesterday but I was already out for the day.
I have now depooled es1019, so let me know when you are around today so I can stop MySQL for you and power it off.

Mentioned in SAL (#wikimedia-operations) [2018-03-08T07:22:35Z] <marostegui@tin> Synchronized wmf-config/db-eqiad.php: Depool es1019 for maintenance - T187530 (duration: 01m 16s)

Mentioned in SAL (#wikimedia-operations) [2018-03-08T14:38:19Z] <marostegui> Stop mysql on es1019 - T187530

Mentioned in SAL (#wikimedia-operations) [2018-03-08T15:41:52Z] <marostegui> Power off es1019 - T187530

Cmjohnson closed this task as Resolved.Mar 8 2018, 3:49 PM
Cmjohnson claimed this task.

Reset the server, drained all power, removed power cables, held in power button for 10 secs. Restored everything and mgmt IF now works fine. Resolving