Page MenuHomePhabricator

db2081 unreachable
Closed, ResolvedPublic

Description

Hi,

I have rebooted db2081 (A6) and it has not come back.
The ssh mgmt interface isn't responding either, so I cannot really connect to see what's going on there.

Can you help us to see what's going on?

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Papaul triaged this task as Medium priority.Oct 16 2017, 2:26 PM

Please see attachment for the error that was on the screen

Steps taken:

  • Removed PSU's for couple of minutes
  • Update IDRAC firmware from 2.40 to 2.50
  • Update BIOS from 2.4.3 to 2.5.5
  • Power back up system

Leaving this task open for now.

Thanks @Papaul for troubleshooting this.
This was one of the new servers we recently bought.
Will you talk to the vendor to get a technician to look at it or advise?

This server was not in production, so we can do whatever we like with it to get it fixed.

@Marostegui when you call Dell the first thing they will tell you is to update the firmware that is the reason i didn't call them and i went ahead and update all the firmwares and left the task open to monitor the system. In case this happen again then I will get in touch with Dell.

Thanks.

I think lately we have been doing the following: if it is the first time, upgrade firmware and collect logs/other proof. If it is the second time, collect evidence and ask for replacements. We can do that for there and close the ticket-- for now.

@Marostegui when you call Dell the first thing they will tell you is to update the firmware that is the reason i didn't call them and i went ahead and update all the firmwares and left the task open to monitor the system. In case this happen again then I will get in touch with Dell.

Thanks.

Ah sure :-)
I can see the system is up again. So I will start MySQL and leave it running like it normally would

Thanks!

Mentioned in SAL (#wikimedia-operations) [2017-10-17T07:54:27Z] <marostegui> Stop MySQL and reboot db2081 to see if it works fine - T178140

I am going to close this for now as resolved. Rebooted the host twice without any issues.
So far it looks good, if it happens again, we can reopen.

Thanks @Papaul for fixing this