Creating this just for the record
Looks like db1130 is having issues with the BBU and the policy has changed to WriteThrough:
root@db1130:~# megacli -LDInfo -Lall -aALL Adapter 0 -- Virtual Drive Information: Virtual Drive: 0 (Target Id: 0) Name : RAID Level : Primary-1, Secondary-0, RAID Level Qualifier-0 Size : 4.364 TB Sector Size : 512 Is VD emulated : No Mirror Data : 4.364 TB State : Optimal Strip Size : 256 KB Number Of Drives : 6 Span Depth : 1 Default Cache Policy: WriteBack, ReadAhead, Direct, No Write Cache if Bad BBU Current Cache Policy: WriteThrough, ReadAhead, Direct, No Write Cache if Bad BBU
These are the HW logs:
------------------------------------------------------------------------------- Record: 14 Date/Time: 10/30/2019 01:16:24 Source: system Severity: Non-Critical Description: The PERC1 battery is low. ------------------------------------------------------------------------------- Record: 15 Date/Time: 10/30/2019 01:18:34 Source: system Severity: Ok Description: The PERC1 battery is operating normally. ------------------------------------------------------------------------------- Record: 16 Date/Time: 12/13/2019 18:14:21 Source: system Severity: Non-Critical Description: The PERC1 battery is low. ------------------------------------------------------------------------------- Record: 17 Date/Time: 12/13/2019 18:21:56 Source: system Severity: Ok Description: The PERC1 battery is operating normally. ------------------------------------------------------------------------------- Record: 18 Date/Time: 12/13/2019 20:13:31 Source: system Severity: Non-Critical Description: The PERC1 battery is low. ------------------------------------------------------------------------------- Record: 19 Date/Time: 12/13/2019 20:21:06 Source: system Severity: Ok Description: The PERC1 battery is operating normally. ------------------------------------------------------------------------------- Record: 20 Date/Time: 12/13/2019 21:14:11 Source: system Severity: Non-Critical Description: The PERC1 battery is low. ------------------------------------------------------------------------------- Record: 21 Date/Time: 12/13/2019 21:21:46 Source: system Severity: Ok Description: The PERC1 battery is operating normally. ------------------------------------------------------------------------------- Record: 22 Date/Time: 12/13/2019 22:13:46 Source: system Severity: Non-Critical Description: The PERC1 battery is low. ------------------------------------------------------------------------------- Record: 23 Date/Time: 12/13/2019 22:21:21 Source: system Severity: Ok Description: The PERC1 battery is operating normally. ------------------------------------------------------------------------------- Record: 24 Date/Time: 12/13/2019 23:13:21 Source: system Severity: Non-Critical Description: The PERC1 battery is low. ------------------------------------------------------------------------------- Record: 25 Date/Time: 12/13/2019 23:22:01 Source: system Severity: Ok Description: The PERC1 battery is operating normally. ------------------------------------------------------------------------------- Record: 26 Date/Time: 12/14/2019 00:12:56 Source: system Severity: Non-Critical Description: The PERC1 battery is low. ------------------------------------------------------------------------------- Record: 27 Date/Time: 12/14/2019 00:21:31 Source: system Severity: Ok Description: The PERC1 battery is operating normally. ------------------------------------------------------------------------------- Record: 28 Date/Time: 12/14/2019 01:12:31 Source: system Severity: Non-Critical Description: The PERC1 battery is low. ------------------------------------------------------------------------------- Record: 29 Date/Time: 12/14/2019 01:22:16 Source: system Severity: Ok Description: The PERC1 battery is operating normally. ------------------------------------------------------------------------------- Record: 30 Date/Time: 12/14/2019 02:13:06 Source: system Severity: Non-Critical Description: The PERC1 battery is low. ------------------------------------------------------------------------------- Record: 31 Date/Time: 12/14/2019 02:21:51 Source: system Severity: Ok Description: The PERC1 battery is operating normally. ------------------------------------------------------------------------------- Record: 32 Date/Time: 12/14/2019 03:12:46 Source: system Severity: Non-Critical Description: The PERC1 battery is low. ------------------------------------------------------------------------------- Record: 33 Date/Time: 12/14/2019 03:22:26 Source: system Severity: Ok Description: The PERC1 battery is operating normally. ------------------------------------------------------------------------------- Record: 34 Date/Time: 12/14/2019 04:12:21 Source: system Severity: Non-Critical Description: The PERC1 battery is low. ------------------------------------------------------------------------------- Record: 35 Date/Time: 12/14/2019 04:22:06 Source: system Severity: Ok Description: The PERC1 battery is operating normally. ------------------------------------------------------------------------------- Record: 36 Date/Time: 12/14/2019 05:11:56 Source: system Severity: Non-Critical Description: The PERC1 battery is low. ------------------------------------------------------------------------------- Record: 37 Date/Time: 12/14/2019 05:21:36 Source: system Severity: Ok Description: The PERC1 battery is operating normally. ------------------------------------------------------------------------------- Record: 38 Date/Time: 12/14/2019 06:11:31 Source: system Severity: Non-Critical Description: The PERC1 battery is low. ------------------------------------------------------------------------------- Record: 39 Date/Time: 12/14/2019 06:22:16 Source: system Severity: Ok Description: The PERC1 battery is operating normally. ------------------------------------------------------------------------------- Record: 40 Date/Time: 12/14/2019 07:12:06 Source: system Severity: Non-Critical Description: The PERC1 battery is low. ------------------------------------------------------------------------------- Record: 41 Date/Time: 12/14/2019 07:21:56 Source: system Severity: Ok Description: The PERC1 battery is operating normally. ------------------------------------------------------------------------------- Record: 42 Date/Time: 12/14/2019 08:11:46 Source: system Severity: Non-Critical Description: The PERC1 battery is low. ------------------------------------------------------------------------------- Record: 43 Date/Time: 12/14/2019 08:22:31 Source: system Severity: Ok Description: The PERC1 battery is operating normally. ------------------------------------------------------------------------------- Record: 44 Date/Time: 12/14/2019 09:11:16 Source: system Severity: Non-Critical Description: The PERC1 battery is low. ------------------------------------------------------------------------------- Record: 45 Date/Time: 12/14/2019 09:22:11 Source: system Severity: Ok Description: The PERC1 battery is operating normally. ------------------------------------------------------------------------------- Record: 46 Date/Time: 12/14/2019 10:10:56 Source: system Severity: Non-Critical Description: The PERC1 battery is low. ------------------------------------------------------------------------------- Record: 47 Date/Time: 12/14/2019 10:12:01 Source: system Severity: Ok Description: The PERC1 battery is operating normally. ------------------------------------------------------------------------------- Record: 48 Date/Time: 12/14/2019 11:10:31 Source: system Severity: Non-Critical Description: The PERC1 battery is low. ------------------------------------------------------------------------------- Record: 49 Date/Time: 12/14/2019 11:12:41 Source: system Severity: Ok Description: The PERC1 battery is operating normally. ------------------------------------------------------------------------------- Record: 50 Date/Time: 12/14/2019 12:10:06 Source: system Severity: Non-Critical Description: The PERC1 battery is low. ------------------------------------------------------------------------------- Record: 51 Date/Time: 12/14/2019 12:12:11 Source: system Severity: Ok Description: The PERC1 battery is operating normally. ------------------------------------------------------------------------------- Record: 52 Date/Time: 12/14/2019 13:10:46 Source: system Severity: Non-Critical Description: The PERC1 battery is low. ------------------------------------------------------------------------------- Record: 53 Date/Time: 12/14/2019 13:12:56 Source: system Severity: Ok Description: The PERC1 battery is operating normally. ------------------------------------------------------------------------------- Record: 54 Date/Time: 12/14/2019 14:10:16 Source: system Severity: Non-Critical Description: The PERC1 battery is low. ------------------------------------------------------------------------------- Record: 55 Date/Time: 12/14/2019 14:12:31 Source: system Severity: Ok Description: The PERC1 battery is operating normally. ------------------------------------------------------------------------------- Record: 56 Date/Time: 12/14/2019 15:09:56 Source: system Severity: Non-Critical Description: The PERC1 battery is low. ------------------------------------------------------------------------------- Record: 57 Date/Time: 12/14/2019 15:12:01 Source: system Severity: Ok Description: The PERC1 battery is operating normally. ------------------------------------------------------------------------------- Record: 58 Date/Time: 12/14/2019 16:09:26 Source: system Severity: Non-Critical Description: The PERC1 battery is low. ------------------------------------------------------------------------------- Record: 59 Date/Time: 12/14/2019 16:12:46 Source: system Severity: Ok Description: The PERC1 battery is operating normally. ------------------------------------------------------------------------------- Record: 60 Date/Time: 12/14/2019 17:09:06 Source: system Severity: Non-Critical Description: The PERC1 battery is low. ------------------------------------------------------------------------------- Record: 61 Date/Time: 12/14/2019 17:12:16 Source: system Severity: Ok Description: The PERC1 battery is operating normally. ------------------------------------------------------------------------------- Record: 62 Date/Time: 12/14/2019 18:08:36 Source: system Severity: Non-Critical Description: The PERC1 battery is low. ------------------------------------------------------------------------------- Record: 63 Date/Time: 12/14/2019 18:13:01 Source: system Severity: Ok Description: The PERC1 battery is operating normally. ------------------------------------------------------------------------------- Record: 64 Date/Time: 12/14/2019 19:09:21 Source: system Severity: Non-Critical Description: The PERC1 battery is low. ------------------------------------------------------------------------------- Record: 65 Date/Time: 12/14/2019 19:12:36 Source: system Severity: Ok Description: The PERC1 battery is operating normally. ------------------------------------------------------------------------------- Record: 66 Date/Time: 12/14/2019 20:08:56 Source: system Severity: Non-Critical Description: The PERC1 battery is low. ------------------------------------------------------------------------------- Record: 67 Date/Time: 12/14/2019 20:13:11 Source: system Severity: Ok Description: The PERC1 battery is operating normally. ------------------------------------------------------------------------------- Record: 68 Date/Time: 12/14/2019 21:08:26 Source: system Severity: Non-Critical Description: The PERC1 battery is low. ------------------------------------------------------------------------------- Record: 69 Date/Time: 12/14/2019 21:12:51 Source: system Severity: Ok Description: The PERC1 battery is operating normally. ------------------------------------------------------------------------------- Record: 70 Date/Time: 12/14/2019 21:52:56 Source: system Severity: Non-Critical Description: The PERC1 battery is low. ------------------------------------------------------------------------------- Record: 71 Date/Time: 12/14/2019 22:02:41 Source: system Severity: Ok Description: The PERC1 battery is operating normally. ------------------------------------------------------------------------------- Record: 72 Date/Time: 12/14/2019 22:08:06 Source: system Severity: Non-Critical Description: The PERC1 battery is low. ------------------------------------------------------------------------------- Record: 73 Date/Time: 12/14/2019 22:12:21 Source: system Severity: Ok Description: The PERC1 battery is operating normally. ------------------------------------------------------------------------------- Record: 74 Date/Time: 12/14/2019 23:07:36 Source: system Severity: Non-Critical Description: The PERC1 battery is low. ------------------------------------------------------------------------------- Record: 75 Date/Time: 12/14/2019 23:13:06 Source: system Severity: Ok Description: The PERC1 battery is operating normally. ------------------------------------------------------------------------------- Record: 76 Date/Time: 12/15/2019 00:08:21 Source: system Severity: Non-Critical Description: The PERC1 battery is low. ------------------------------------------------------------------------------- Record: 77 Date/Time: 12/15/2019 00:12:41 Source: system Severity: Ok Description: The PERC1 battery is operating normally. ------------------------------------------------------------------------------- Record: 78 Date/Time: 12/15/2019 01:07:56 Source: system Severity: Non-Critical Description: The PERC1 battery is low. ------------------------------------------------------------------------------- Record: 79 Date/Time: 12/15/2019 01:23:06 Source: system Severity: Ok Description: The PERC1 battery is operating normally. ------------------------------------------------------------------------------- Record: 80 Date/Time: 12/15/2019 02:07:31 Source: system Severity: Non-Critical Description: The PERC1 battery is low. ------------------------------------------------------------------------------- Record: 81 Date/Time: 12/15/2019 02:12:56 Source: system Severity: Ok Description: The PERC1 battery is operating normally. ------------------------------------------------------------------------------- Record: 82 Date/Time: 12/15/2019 03:02:46 Source: system Severity: Non-Critical Description: The PERC1 battery is low. ------------------------------------------------------------------------------- Record: 83 Date/Time: 12/15/2019 03:23:21 Source: system Severity: Ok Description: The PERC1 battery is operating normally. ------------------------------------------------------------------------------- Record: 84 Date/Time: 12/15/2019 03:33:01 Source: system Severity: Non-Critical Description: The PERC1 battery is low. ------------------------------------------------------------------------------- Record: 85 Date/Time: 12/15/2019 03:42:51 Source: system Severity: Ok Description: The PERC1 battery is operating normally. ------------------------------------------------------------------------------- Record: 86 Date/Time: 12/15/2019 03:52:31 Source: system Severity: Non-Critical Description: The PERC1 battery is low. ------------------------------------------------------------------------------- Record: 87 Date/Time: 12/15/2019 04:13:11 Source: system Severity: Ok Description: The PERC1 battery is operating normally. ------------------------------------------------------------------------------- Record: 88 Date/Time: 12/15/2019 04:22:56 Source: system Severity: Non-Critical Description: The PERC1 battery is low. ------------------------------------------------------------------------------- Record: 89 Date/Time: 12/15/2019 04:32:41 Source: system Severity: Ok Description: The PERC1 battery is operating normally. ------------------------------------------------------------------------------- Record: 90 Date/Time: 12/15/2019 05:01:56 Source: system Severity: Non-Critical Description: The PERC1 battery is low. ------------------------------------------------------------------------------- Record: 91 Date/Time: 12/15/2019 05:12:46 Source: system Severity: Ok Description: The PERC1 battery is operating normally. ------------------------------------------------------------------------------- Record: 92 Date/Time: 12/15/2019 05:43:06 Source: system Severity: Non-Critical Description: The PERC1 battery is low. ------------------------------------------------------------------------------- Record: 93 Date/Time: 12/15/2019 05:52:46 Source: system Severity: Ok Description: The PERC1 battery is operating normally. ------------------------------------------------------------------------------- Record: 94 Date/Time: 12/15/2019 06:02:36 Source: system Severity: Non-Critical Description: The PERC1 battery is low. ------------------------------------------------------------------------------- Record: 95 Date/Time: 12/15/2019 06:13:21 Source: system Severity: Ok Description: The PERC1 battery is operating normally. ------------------------------------------------------------------------------- Record: 96 Date/Time: 12/15/2019 06:23:11 Source: system Severity: Non-Critical Description: The PERC1 battery is low. ------------------------------------------------------------------------------- Record: 97 Date/Time: 12/15/2019 06:32:51 Source: system Severity: Ok Description: The PERC1 battery is operating normally. ------------------------------------------------------------------------------- Record: 98 Date/Time: 12/15/2019 06:43:46 Source: system Severity: Non-Critical Description: The PERC1 battery is low. ------------------------------------------------------------------------------- Record: 99 Date/Time: 12/15/2019 06:53:31 Source: system Severity: Ok Description: The PERC1 battery is operating normally. ------------------------------------------------------------------------------- Record: 100 Date/Time: 12/15/2019 07:06:31 Source: system Severity: Non-Critical Description: The PERC1 battery is low. ------------------------------------------------------------------------------- Record: 101 Date/Time: 12/15/2019 07:13:01 Source: system Severity: Ok Description: The PERC1 battery is operating normally. ------------------------------------------------------------------------------- Record: 102 Date/Time: 12/15/2019 08:02:51 Source: system Severity: Non-Critical Description: The PERC1 battery is low. ------------------------------------------------------------------------------- Record: 103 Date/Time: 12/15/2019 08:13:41 Source: system Severity: Ok Description: The PERC1 battery is operating normally. ------------------------------------------------------------------------------- Record: 104 Date/Time: 12/15/2019 08:23:26 Source: system Severity: Non-Critical Description: The PERC1 battery is low. ------------------------------------------------------------------------------- Record: 105 Date/Time: 12/15/2019 08:33:06 Source: system Severity: Ok Description: The PERC1 battery is operating normally. ------------------------------------------------------------------------------- Record: 106 Date/Time: 12/15/2019 09:03:31 Source: system Severity: Non-Critical Description: The PERC1 battery is low. ------------------------------------------------------------------------------- Record: 107 Date/Time: 12/15/2019 09:13:11 Source: system Severity: Ok Description: The PERC1 battery is operating normally. ------------------------------------------------------------------------------- Record: 108 Date/Time: 12/15/2019 09:23:01 Source: system Severity: Non-Critical Description: The PERC1 battery is low. ------------------------------------------------------------------------------- Record: 109 Date/Time: 12/15/2019 09:33:51 Source: system Severity: Ok Description: The PERC1 battery is operating normally. ------------------------------------------------------------------------------- Record: 110 Date/Time: 12/15/2019 09:43:36 Source: system Severity: Non-Critical Description: The PERC1 battery is low. ------------------------------------------------------------------------------- Record: 111 Date/Time: 12/15/2019 10:03:01 Source: system Severity: Ok Description: The PERC1 battery is operating normally. ------------------------------------------------------------------------------- Record: 112 Date/Time: 12/15/2019 10:05:16 Source: system Severity: Non-Critical Description: The PERC1 battery is low. ------------------------------------------------------------------------------- Record: 113 Date/Time: 12/15/2019 10:43:11 Source: system Severity: Ok Description: The PERC1 battery is operating normally. ------------------------------------------------------------------------------- Record: 114 Date/Time: 12/15/2019 10:54:01 Source: system Severity: Non-Critical Description: The PERC1 battery is low. ------------------------------------------------------------------------------- Record: 115 Date/Time: 12/15/2019 11:32:56 Source: system Severity: Ok Description: The PERC1 battery is operating normally. ------------------------------------------------------------------------------- Record: 116 Date/Time: 12/15/2019 11:53:36 Source: system Severity: Non-Critical Description: The PERC1 battery is low. ------------------------------------------------------------------------------- Record: 117 Date/Time: 12/15/2019 12:03:21 Source: system Severity: Ok Description: The PERC1 battery is operating normally. ------------------------------------------------------------------------------- Record: 118 Date/Time: 12/15/2019 12:05:31 Source: system Severity: Non-Critical Description: The PERC1 battery is low. -------------------------------------------------------------------------------
Looks like re-learn was enabled:
root@db1130:~# megacli -AdpBbuCmd -a0 BBU status for Adapter: 0 BatteryType: BBU Battery State: Unknown Battery backup charge time : 0 hours BBU Capacity Info for Adapter: 0 Relative State of Charge: 59 % Absolute State of charge: 0 % Remaining Capacity: 94 mAh Full Charge Capacity: 161 mAh Run time to empty: Battery is not being charged. Average time to empty: 8 Min. Estimated Time to full recharge: Battery is not being charged. Cycle Count: 11 Max Error = 0 % Remaining Capacity Alarm = 0 mAh Remining Time Alarm = 0 Min BBU Design Info for Adapter: 0 Date of Manufacture: 00/00, 0000 Design Capacity: 0 mAh Design Voltage: 0 mV Specification Info: 0 Serial Number: 0 Pack Stat Configuration: 0x0000 Manufacture Name: 0x129 Firmware Version : 0.6 Device Name: Device Chemistry: Battery FRU: N/A Module Version = 0.6 Transparent Learn = 1 App Data = 0 BBU Properties for Adapter: 0 Auto Learn Period: 90 Days Next Learn time: Tue Jan 28 00:40:07 2020 Learn Delay Interval:0 Hours Auto-Learn Mode: Transparent Exit Code: 0x00
I have forced a re-learn cycle:
root@db1130:~# megacli -AdpBbuCmd -BbuLearn -aAll Adapter 0: BBU Learn Succeeded.
And we got the recover:
[07:01:21] <+icinga-wm> RECOVERY - MegaRAID on db1130 is OK: OK: optimal, 1 logical, 6 physical, WriteBack policy https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring
And HW logs:
------------------------------------------------------------------------------- Record: 119 Date/Time: 12/16/2019 05:58:02 Source: system Severity: Ok Description: The PERC1 battery is operating normally. -------------------------------------------------------------------------------
Let's see if it stops after the relearning.
We should check (and disable if enabled) the learning mode for the following hosts:
db11[21-38]
db21[03-35]