Page MenuHomePhabricator

analytics1067: Broken BBU
Closed, ResolvedPublic

Description

Looks like it has a broken BBU:

root@analytics1067:~# megacli -AdpBbuCmd -GetBbuStatus -a0

BBU status for Adapter: 0

BatteryType: BBU
Battery State: Unknown

And because of that the raid went to WT:

root@analytics1067:~# megacli -LDInfo -L0 -a0


Adapter 0 -- Virtual Drive Information:
Virtual Drive: 0 (Target Id: 0)
Name                :
RAID Level          : Primary-1, Secondary-0, RAID Level Qualifier-0
Size                : 931.0 GB
Sector Size         : 512
Is VD emulated      : No
Mirror Data         : 931.0 GB
State               : Optimal
Strip Size          : 64 KB
Number Of Drives    : 2
Span Depth          : 1
Default Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache if Bad BBU
Current Cache Policy: WriteThrough, ReadAdaptive, Direct, No Write Cache if Bad BBU

Event Timeline

Marostegui triaged this task as Medium priority.Jun 13 2017, 1:30 PM

@Cmjohnson this host is one of the last batch (so under warranty for sure), can you order a new BBU whenever you have time?

I have not forced the RAID to go to WB, I would leave that to Analytics.
If needed, this should be it:

megacli -LDSetProp -ForcedWB -Immediate -Lall -aAll

And to revert it:

megacli -LDSetProp -NoCachedBadBBU -Immediate -Lall -aAll

Looks like it recovered itself:

˜/icinga-wm 16:06> RECOVERY - MegaRAID on analytics1067 is OK: OK: optimal, 13 logical, 14 physical, WriteBack policy
root@analytics1067:~# megacli -AdpBbuCmd -GetBbuStatus -a0 | grep -e '^isSOHGood' -e '^Charger Status' -e '^Remaining Capacity'
Charger Status: In Progress
Remaining Capacity: 350 mAh
isSOHGood: Yes
Default Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache if Bad BBU
Current Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache if Bad BBU

Let's close this and if it happens again, we can troubleshoot more.

Looks like when the server is recharging it might not shown the correct status of the BBU, looks like this wasn't broking, just started an Auto-Learn cycle: T167809