Page MenuHomePhabricator

(OoW) restbase2009 lockup
Closed, ResolvedPublic

Description

restbase2009 has been down since 2019-07-07 8:40, on power off / on a message appeared on console about battery failure from the controller (to be investigated)

Event Timeline

jijiki triaged this task as Medium priority.Jul 8 2019, 5:32 AM
jijiki added a project: serviceops.
jijiki added a subscriber: jijiki.
wiki_willy renamed this task from restbase2009 lockup to (OoW) restbase2009 lockup.Jul 15 2019, 7:39 PM
wiki_willy assigned this task to Papaul.

Indeed the server is not showing the Smart Storage Battery status. Lets try to upgrade the server firmware since the last upgrade was from 2015.

@fgiunchedi Let me know when we can depool this server for firmware upgrade.

Thanks

@Papaul you can take the server down as needed.

After Firmware upgrade, we still have the Smart storage battery problem since the server is out of warranty we can not have the part replaced.

@Eevans Shall we mark restbase2009 as inactive on conftool?

@Eevans Shall we mark restbase2009 as inactive on conftool?

I'm not positive I understand the implications of that.

As far as I know, the host went down, and was rebooted through the management console. Later it was taken down again for a firmware update to address the smart storage battery fault (which did not clear the fault). As of this moment, the machine is up and running. What would marking it inactive do?

@Eevans I was under the impression we have more work to be done on the server. Shall we mark this task as resolved?

@Eevans I was under the impression we have more work to be done on the server. Shall we mark this task as resolved?

I was under that impression too. @PPaul's last comment indicated that there is a problem with the Smart storage battery, but that the machine is out of warranty. What do we do in such a situation?

@Papaul Can you let us know what are our options (if any?)

Papaul added subscribers: wiki_willy, Papaul.

@jijiki I will talking to @wiki_willy to see what are our options on this.

@wiki_willy this system is out if warranty since April 2019 and we do have a problem with the Smart storage battery. The option I have here is, We do have 5 HP servers that were decom I can look and see if those servers have the same Smart storage battery with this system. In case this is not the case, can you please advice.

Thank you.

@Papaul - if you can't find a spare from any of those decom servers, we can order it, since it's still a while before the 5yr mark.

Thanks
Willy

Mentioned in SAL (#wikimedia-operations) [2019-08-05T14:06:16Z] <jijiki> Depool and restart restbase2009 for maint - T227408

@jijiki i need this serveur power down
thanks

@Papaul Server is depooled, ping me when do pool it back, many thanks !

@jijiki please repool the server when you have a minute. We will have to order a new Storage battery for the server since all the decom HP servers are GEN8 and this one is a GEN9 so different storage battery.

Thanks.

Papaul mentioned this in Unknown Object (Task).Aug 5 2019, 4:09 PM

@jijiki I made a procurement task the the storage battery at T229847

@Papaul I am marking this as resolved, thank you!

BBlack added a subscriber: BBlack.

Re-open as this isn't really complete yet, the battery came in and replacement is proceeding. Since @jijiki did this before and claims it's just a depool command, we'll go with that again :)

Mentioned in SAL (#wikimedia-operations) [2019-09-11T17:05:11Z] <bblack> restbase2009 - depool for hardware work - T227408

Mentioned in SAL (#wikimedia-operations) [2019-09-11T17:07:20Z] <bblack> restbase2009 - shutdown for hardware work - T227408

Smart storage replacement complete.
Embedded HPE Smart Storage Battery 875241-B21 878643-001 6WQXL0BB2BQ4H8 01 0.60 OK

This can be resolved