elastic2050.mgmt is down. Maybe a restart should fix this?
history of repair attempts
- @Mathew.onipe sent a bmc reset via the OS, no effect.
- @RobH cannot ping the mgmt interface.
- interface likely needs full power removal/reset.
elastic2050.mgmt is down. Maybe a restart should fix this?
Mentioned in SAL (#wikimedia-operations) [2019-08-16T14:39:06Z] <onimisionipe> run bmc-device --cold-reset; echo $? in elastic2050 hoping it resets mgmt interface -T230597
Please note this mgmt interface is still down:
robh@cumin2001:~$ ping elastic2050.mgmt.codfw.wmnet PING elastic2050.mgmt.codfw.wmnet (10.193.3.56) 56(84) bytes of data.
no ping returns.
First step, checking the cable (@Papaul will have to do this.)
If that doesn't fix it, fixing the drac at this point requires a full system power loss/removal to reset the drac.
When can this system experience power loss/removal from use for a few minutes?
IRC sync: Chatted with @Mathew.onipe, who let me know they had synced with @Papaul to take this offline on Monday to reset the power/bmc.
@Papaul On second thought, we have other servers and losing one elastic node is Ok. So this should be set to normal
Mentioned in SAL (#wikimedia-operations) [2019-08-19T07:59:46Z] <onimisionipe> shutdown elastic2050 to prepare for mgmt reset - T230597
Upgrade firmware as well
Before
BIOS Version 1.5.6
iDRAC Firmware Version 3.21.21.21
After
BIOS Version 2.2.11
iDRAC Firmware Version 3.34.34.34
Server is back up . Resolving this.
Mentioned in SAL (#wikimedia-operations) [2019-08-19T16:45:22Z] <onimisionipe> pool elastic2050. mgmt issue has been resolved - T230597