db1067 is showing really high temperature on the BBU
root@db1067:~# megacli -AdpBbuCmd -a0 BBU status for Adapter: 0 BatteryType: BBU Voltage: 3957 mV Current: 0 mA Temperature: 76 C Battery State: Optimal BBU Firmware Status: Charging Status : None Voltage : OK Temperature : High Learn Cycle Requested : No Learn Cycle Active : No Learn Cycle Status : OK Learn Cycle Timeout : No I2c Errors Detected : No Battery Pack Missing : No Battery Replacement required : No Remaining Capacity Low : No Periodic Learn Required : No Transparent Learn : No No space to cache offload : No Pack is about to fail & should be replaced : No Cache Offload premium feature required : No Module microcode update required : No BBU GasGauge Status: 0x0238 Relative State of Charge: 100 % Charger Status: Complete Remaining Capacity: 542 mAh Full Charge Capacity: 542 mAh isSOHGood: Yes Battery backup charge time : 0 hours BBU Capacity Info for Adapter: 0 Relative State of Charge: 100 % Absolute State of charge: 0 % Remaining Capacity: 542 mAh Full Charge Capacity: 542 mAh Run time to empty: Battery is not being charged. Average time to empty: 43 Min. Estimated Time to full recharge: Battery is not being charged. Cycle Count: 1 Max Error = 0 % Remaining Capacity Alarm = 0 mAh Remining Time Alarm = 0 Min BBU Design Info for Adapter: 0 Date of Manufacture: 07/18, 2011 Design Capacity: 90 mAh Design Voltage: 0 mV Specification Info: 0 Serial Number: 0 Pack Stat Configuration: 0x0000 Manufacture Name: Firmware Version : 0148 03 Device Name: Device Chemistry: Battery FRU: N/A Module Version = 0148 03 Transparent Learn = 1 App Data = 0 BBU Properties for Adapter: 0 Auto Learn Period: 90 Days Next Learn time: None Learn Delay Interval:0 Hours Auto-Learn Mode: Disabled Exit Code: 0x00
The policy for the RAID is WriteThru at the moment , even though the BBU isn't showing anything bad.
root@db1067:~# megacli -ldinfo -l0 -a0 | grep Policy Default Cache Policy: WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU Current Cache Policy: WriteThrough, ReadAheadNone, Direct, No Write Cache if Bad BBU Default Access Policy: Read/Write Current Access Policy: Read/Write Disk Cache Policy : Disk's Default Default Power Savings Policy: Controller Defined Current Power Savings Policy: None
Given that this is the s1 candidate master, it is probably better to just replace the BBU and be on the safe side
I have been talking to @Cmjohnson and he is going to check if we have spare BBUs