Page MenuHomePhabricator

analytics1066's BBU might need to be replaced
Closed, ResolvedPublic

Description

Hi DCops,

for some reason the analytics1066 megacli LDs don't get the WriteBack settings:

elukey@analytics1066:~$ sudo megacli -LDInfo -Lall -aALL | grep "Cache Policy"
Default Cache Policy: WriteBack, ReadAdaptive, Direct, Write Cache OK if Bad BBU
Current Cache Policy: WriteThrough, ReadAheadNone, Direct, Write Cache OK if Bad BBU
Disk Cache Policy   : Disk's Default
Default Cache Policy: WriteBack, ReadAdaptive, Direct, Write Cache OK if Bad BBU
Current Cache Policy: WriteThrough, ReadAheadNone, Direct, Write Cache OK if Bad BBU
Disk Cache Policy   : Disk's Default
Default Cache Policy: WriteBack, ReadAdaptive, Direct, Write Cache OK if Bad BBU
Current Cache Policy: WriteThrough, ReadAheadNone, Direct, Write Cache OK if Bad BBU
Disk Cache Policy   : Disk's Default
Default Cache Policy: WriteBack, ReadAdaptive, Direct, Write Cache OK if Bad BBU
Current Cache Policy: WriteThrough, ReadAheadNone, Direct, Write Cache OK if Bad BBU
Disk Cache Policy   : Disk's Default
Default Cache Policy: WriteBack, ReadAdaptive, Direct, Write Cache OK if Bad BBU
Current Cache Policy: WriteThrough, ReadAheadNone, Direct, Write Cache OK if Bad BBU
Disk Cache Policy   : Disk's Default
Default Cache Policy: WriteBack, ReadAdaptive, Direct, Write Cache OK if Bad BBU
Current Cache Policy: WriteThrough, ReadAheadNone, Direct, Write Cache OK if Bad BBU
Disk Cache Policy   : Disk's Default
Default Cache Policy: WriteBack, ReadAdaptive, Direct, Write Cache OK if Bad BBU
Current Cache Policy: WriteThrough, ReadAheadNone, Direct, Write Cache OK if Bad BBU
Disk Cache Policy   : Disk's Default
Default Cache Policy: WriteBack, ReadAdaptive, Direct, Write Cache OK if Bad BBU
Current Cache Policy: WriteThrough, ReadAheadNone, Direct, Write Cache OK if Bad BBU
Disk Cache Policy   : Disk's Default
Default Cache Policy: WriteBack, ReadAdaptive, Direct, Write Cache OK if Bad BBU
Current Cache Policy: WriteThrough, ReadAheadNone, Direct, Write Cache OK if Bad BBU
Disk Cache Policy   : Disk's Default
Default Cache Policy: WriteBack, ReadAdaptive, Direct, Write Cache OK if Bad BBU
Current Cache Policy: WriteThrough, ReadAheadNone, Direct, Write Cache OK if Bad BBU
Disk Cache Policy   : Disk's Default
Default Cache Policy: WriteBack, ReadAdaptive, Direct, Write Cache OK if Bad BBU
Current Cache Policy: WriteThrough, ReadAheadNone, Direct, Write Cache OK if Bad BBU
Disk Cache Policy   : Disk's Default
Default Cache Policy: WriteBack, ReadAdaptive, Direct, Write Cache OK if Bad BBU
Current Cache Policy: WriteThrough, ReadAheadNone, Direct, Write Cache OK if Bad BBU
Disk Cache Policy   : Disk's Default

I tried to force it, but nothing happens:

sudo megacli -LDSetProp -ForcedWB -Immediate -Lall -aAll
sudo megacli -LDSetProp WB -LALL -aALL

The BBU looks good from what I can see, but I suspect that it might be broken or not working correctly:

elukey@analytics1066:~$ sudo megacli -AdpBbuCmd -GetBbuStatus -a0 
                                     
BBU status for Adapter: 0

BatteryType: BBU
Voltage: 3909 mV
Current: 0 mA
Temperature: 49 C
Battery State: Optimal
BBU Firmware Status:

  Charging Status              : None
  Voltage                                 : OK
  Temperature                             : OK
  Learn Cycle Requested	                  : No
  Learn Cycle Active                      : No
  Learn Cycle Status                      : OK
  Learn Cycle Timeout                     : No
  I2c Errors Detected                     : No
  Battery Pack Missing                    : No
  Battery Replacement required            : No
  Remaining Capacity Low                  : No
  Periodic Learn Required                 : No
  Transparent Learn                       : No
  No space to cache offload               : No
  Pack is about to fail & should be replaced : No
  Cache Offload premium feature required  : No
  Module microcode update required        : No

BBU GasGauge Status: 0x0138 
Relative State of Charge: 94 %
Charger Status: Complete
Remaining Capacity: 276 mAh
Full Charge Capacity: 296 mAh
isSOHGood: Yes

Exit Code: 0x00

Would it be possible to swap it? If the host is OOW we could get one from the decommissioning nodes in T267932, let me know if it is possible :)

Event Timeline

@razzi the error in icinga is CRITICAL: 12 LD(s) must have write cache policy WriteBack, currently using: WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough

The BBU is the Backup Battery Unit, used as backup power for the RAID write buffer cache, so in case there is a power outage data in the cache but not saved on disk can still be preserved. The WriteBack setting means that data needs to be written only to the controller's buffer to be considered "written" by the os, meanwhile WriteThrough means that the buffer is skipped and the disk needs to store and ack the OS. The WriteBack setting generally give more performance in write speed, but it doesn't work properly if the BBU is faulty (not sure if this is the case but I have little ideas about what's happening).

crusnov added a project: DC-Ops.

@wiki_willy This server is out of warranty by 1 year (purchased 2017) I can probably find a used one in our decom servers. Let me know if this is how you want to proceed.

Hi @Cmjohnson - it sounds like they need it in production. @elukey or @Ottomata - let us know if there's a particular decom'd host you want us to grab the part from. Thanks, Willy

@elukey @Ottomata I would like to do this Monday morning my time around 11am local. 1600UTC

@Cmjohnson perfect, @razzi might be around as well, in case we'll let you to sync and do the work :)

My morning got away from me and this is rescheduled for tomorrow 1400UTC (1000EST)

swapped the bbu, the server is back up and handed back to @elukey

elukey@analytics1066:~$ sudo megacli -LDInfo -Lall -aALL | grep "Cache Policy"
Default Cache Policy: WriteBack, ReadAdaptive, Direct, Write Cache OK if Bad BBU
Current Cache Policy: WriteBack, ReadAdaptive, Direct, Write Cache OK if Bad BBU
Disk Cache Policy   : Disk's Default
Default Cache Policy: WriteBack, ReadAdaptive, Direct, Write Cache OK if Bad BBU
Current Cache Policy: WriteBack, ReadAdaptive, Direct, Write Cache OK if Bad BBU
Disk Cache Policy   : Disk's Default
Default Cache Policy: WriteBack, ReadAdaptive, Direct, Write Cache OK if Bad BBU
Current Cache Policy: WriteBack, ReadAdaptive, Direct, Write Cache OK if Bad BBU
Disk Cache Policy   : Disk's Default
Default Cache Policy: WriteBack, ReadAdaptive, Direct, Write Cache OK if Bad BBU
Current Cache Policy: WriteBack, ReadAdaptive, Direct, Write Cache OK if Bad BBU
Disk Cache Policy   : Disk's Default
Default Cache Policy: WriteBack, ReadAdaptive, Direct, Write Cache OK if Bad BBU
Current Cache Policy: WriteBack, ReadAdaptive, Direct, Write Cache OK if Bad BBU
Disk Cache Policy   : Disk's Default
Default Cache Policy: WriteBack, ReadAdaptive, Direct, Write Cache OK if Bad BBU
Current Cache Policy: WriteBack, ReadAdaptive, Direct, Write Cache OK if Bad BBU
Disk Cache Policy   : Disk's Default
Default Cache Policy: WriteBack, ReadAdaptive, Direct, Write Cache OK if Bad BBU
Current Cache Policy: WriteBack, ReadAdaptive, Direct, Write Cache OK if Bad BBU
Disk Cache Policy   : Disk's Default
Default Cache Policy: WriteBack, ReadAdaptive, Direct, Write Cache OK if Bad BBU
Current Cache Policy: WriteBack, ReadAdaptive, Direct, Write Cache OK if Bad BBU
Disk Cache Policy   : Disk's Default
Default Cache Policy: WriteBack, ReadAdaptive, Direct, Write Cache OK if Bad BBU
Current Cache Policy: WriteBack, ReadAdaptive, Direct, Write Cache OK if Bad BBU
Disk Cache Policy   : Disk's Default
Default Cache Policy: WriteBack, ReadAdaptive, Direct, Write Cache OK if Bad BBU
Current Cache Policy: WriteBack, ReadAdaptive, Direct, Write Cache OK if Bad BBU
Disk Cache Policy   : Disk's Default
Default Cache Policy: WriteBack, ReadAdaptive, Direct, Write Cache OK if Bad BBU
Current Cache Policy: WriteBack, ReadAdaptive, Direct, Write Cache OK if Bad BBU
Disk Cache Policy   : Disk's Default
Default Cache Policy: WriteBack, ReadAdaptive, Direct, Write Cache OK if Bad BBU
Current Cache Policy: WriteBack, ReadAdaptive, Direct, Write Cache OK if Bad BBU
Disk Cache Policy   : Disk's Default

\o/ success