Page MenuHomePhabricator

(OoW) restbase2009 lockup
Closed, ResolvedPublic

Description

restbase2009 has been down since 2019-07-07 8:40, on power off / on a message appeared on console about battery failure from the controller (to be investigated)

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJul 7 2019, 3:21 PM
jijiki triaged this task as Normal priority.Jul 8 2019, 5:32 AM
jijiki added a project: serviceops.
jijiki added a subscriber: jijiki.
wiki_willy renamed this task from restbase2009 lockup to (OoW) restbase2009 lockup.Jul 15 2019, 7:39 PM
wiki_willy assigned this task to Papaul.

Indeed the server is not showing the Smart Storage Battery status. Lets try to upgrade the server firmware since the last upgrade was from 2015.

@fgiunchedi Let me know when we can depool this server for firmware upgrade.

Thanks

@Papaul you can take the server down as needed.

After Firmware upgrade, we still have the Smart storage battery problem since the server is out of warranty we can not have the part replaced.

jijiki moved this task from Backlog to Next up on the serviceops board.Jul 19 2019, 8:56 AM

@Eevans Shall we mark restbase2009 as inactive on conftool?

@Eevans Shall we mark restbase2009 as inactive on conftool?

I'm not positive I understand the implications of that.

As far as I know, the host went down, and was rebooted through the management console. Later it was taken down again for a firmware update to address the smart storage battery fault (which did not clear the fault). As of this moment, the machine is up and running. What would marking it inactive do?

@Eevans I was under the impression we have more work to be done on the server. Shall we mark this task as resolved?

Eevans added a subscriber: PPaul.Fri, Jul 26, 1:55 AM

@Eevans I was under the impression we have more work to be done on the server. Shall we mark this task as resolved?

I was under that impression too. @PPaul's last comment indicated that there is a problem with the Smart storage battery, but that the machine is out of warranty. What do we do in such a situation?

jijiki added a comment.EditedFri, Jul 26, 12:34 PM

@Papaul Can you let us know what are our options (if any?)

PPaul removed a subscriber: PPaul.Fri, Jul 26, 3:07 PM
Papaul reassigned this task from Papaul to wiki_willy.Thu, Aug 1, 1:32 PM
Papaul added subscribers: wiki_willy, Papaul.

@jijiki I will talking to @wiki_willy to see what are our options on this.

@wiki_willy this system is out if warranty since April 2019 and we do have a problem with the Smart storage battery. The option I have here is, We do have 5 HP servers that were decom I can look and see if those servers have the same Smart storage battery with this system. In case this is not the case, can you please advice.

Thank you.

@Papaul - if you can't find a spare from any of those decom servers, we can order it, since it's still a while before the 5yr mark.

Thanks
Willy

wiki_willy reassigned this task from wiki_willy to Papaul.Fri, Aug 2, 1:04 AM

Mentioned in SAL (#wikimedia-operations) [2019-08-05T14:06:16Z] <jijiki> Depool and restart restbase2009 for maint - T227408

Papaul added a comment.Mon, Aug 5, 2:10 PM

@jijiki i need this serveur power down
thanks

jijiki added a comment.Mon, Aug 5, 2:10 PM

@Papaul Server is depooled, ping me when do pool it back, many thanks !

Papaul added a comment.Mon, Aug 5, 3:55 PM

@jijiki please repool the server when you have a minute. We will have to order a new Storage battery for the server since all the decom HP servers are GEN8 and this one is a GEN9 so different storage battery.

Thanks.

Papaul mentioned this in Unknown Object (Task).Mon, Aug 5, 4:09 PM

@jijiki I made a procurement task the the storage battery at T229847

Mentioned in SAL (#wikimedia-operations) [2019-08-05T17:33:42Z] <jijiki> Pool restbase2009 - T227408

jijiki closed this task as Resolved.Mon, Aug 5, 5:34 PM

@Papaul I am marking this as resolved, thank you!