Page MenuHomePhabricator

Remove db1082's BBU on-site
Closed, ResolvedPublic

Description

We've been having this host crashing twice due to BBU issues (T258336)
The BBU still doesn't show as disabled or gone by the OS or the ILO, and I haven't found a way to do it.
I believe that we'll still see more crashes if we don't remove/disable it.

Other hosts that had broken BBU, as soon as it is "gone" and not show anymore, they are "fine", so I would like to force this host to do so.

HW logs for both crashes are:

description=Smart Storage Battery has exceeded the maximum amount of devices supported (Battery 1, service information: 0x07). Action: 1. Remove additional devices. 2. Consult server troubleshooting guide. 3. Gather AHS log and contact Support

Can we manually remove/disable it on site?
This host is scheduled to be decommissioned in Q2, but I am trying to work out if we can accelerate the purchase of its replacement in Q1 instead.

Event Timeline

Marostegui created this task.

@Marostegui is host down now? i can remove in about 1 hour

Mentioned in SAL (#wikimedia-operations) [2020-07-27T16:04:31Z] <marostegui> Stop MySQL on db1082 for onsite maintenance - T258910

@Jclark-ctr the host is now off, you can proceed whenever you want
Thank you!

Thanks - I can see it:

root@db1082:~# hpssacli controller all show detail | grep -i battery
   No-Battery Write Cache: Disabled
   Battery/Capacitor Count: 0

Thank you guys for the fast response!