Page MenuHomePhabricator

BBU issues on db1054 (s2 primary master)
Closed, DeclinedPublic

Description

db1054 got its RAID policy changed to WriteThru, these are the logs:

seqNum: 0x0000221e
Time: Thu May 17 04:11:12 2018

Code: 0x000000a2
Class: 1
Locale: 0x08
Event Description: Current capacity of the battery is below threshold
Event Data:
===========
None


seqNum: 0x0000221f
Time: Thu May 17 04:11:12 2018

Code: 0x000000c3
Class: 1
Locale: 0x08
Event Description: BBU disabled; changing WB virtual disks to WT, Forced WB VDs are not affected
Event Data:
===========
None


seqNum: 0x00002220
Time: Thu May 17 04:11:12 2018

Code: 0x00000036
Class: 0
Locale: 0x01
Event Description: Policy change on VD 00/0 to [ID=00,dcp=01,ccp=00,ap=0,dc=0] from [ID=00,dcp=01,ccp=01,ap=0,dc=0]
Event Data:
===========
Target Id: 0
Previous LD Properties
Access Policy: 0
Current Cache Policy: 1
Default Cache Policy: 1
Disk Cache Policy: 0
Name:
NoBgi: 0
New LD Properties
Access Policy: 0
Current Cache Policy: 0
Default Cache Policy: 1
Disk Cache Policy: 0
Name:
NoBgi: 0


seqNum: 0x00002221
Time: Thu May 17 04:20:57 2018

Code: 0x000000c2
Class: 0
Locale: 0x08
Event Description: BBU enabled; changing WT virtual disks to WB
Event Data:
===========
None


seqNum: 0x00002222
Time: Thu May 17 04:20:57 2018

Code: 0x00000036
Class: 0
Locale: 0x01
Event Description: Policy change on VD 00/0 to [ID=00,dcp=01,ccp=01,ap=0,dc=0] from [ID=00,dcp=01,ccp=00,ap=0,dc=0]
Event Data:
===========
Target Id: 0
Previous LD Properties
Access Policy: 0
Current Cache Policy: 0
Default Cache Policy: 1
Disk Cache Policy: 0
Name:
NoBgi: 0
New LD Properties
Access Policy: 0
Current Cache Policy: 1
Default Cache Policy: 1
Disk Cache Policy: 0
Name:
NoBgi: 0


seqNum: 0x00002223
Time: Thu May 17 04:31:47 2018

Code: 0x000000c3
Class: 1
Locale: 0x08
Event Description: BBU disabled; changing WB virtual disks to WT, Forced WB VDs are not affected
Event Data:
===========
None


seqNum: 0x00002224
Time: Thu May 17 04:31:47 2018

Code: 0x00000036
Class: 0
Locale: 0x01
Event Description: Policy change on VD 00/0 to [ID=00,dcp=01,ccp=00,ap=0,dc=0] from [ID=00,dcp=01,ccp=01,ap=0,dc=0]
Event Data:
===========
Target Id: 0
Previous LD Properties
Access Policy: 0
Current Cache Policy: 1
Default Cache Policy: 1
Disk Cache Policy: 0
Name:
NoBgi: 0
New LD Properties
Access Policy: 0
Current Cache Policy: 0
Default Cache Policy: 1
Disk Cache Policy: 0
Name:
NoBgi: 0


seqNum: 0x00002225
Time: Thu May 17 04:41:32 2018

Code: 0x000000c2
Class: 0
Locale: 0x08
Event Description: BBU enabled; changing WT virtual disks to WB
Event Data:
===========
None


seqNum: 0x00002226
Time: Thu May 17 04:41:32 2018

Code: 0x00000036
Class: 0
Locale: 0x01
Event Description: Policy change on VD 00/0 to [ID=00,dcp=01,ccp=01,ap=0,dc=0] from [ID=00,dcp=01,ccp=00,ap=0,dc=0]
Event Data:
===========
Target Id: 0
Previous LD Properties
Access Policy: 0
Current Cache Policy: 0
Default Cache Policy: 1
Disk Cache Policy: 0
Name:
NoBgi: 0
New LD Properties
Access Policy: 0
Current Cache Policy: 1
Default Cache Policy: 1
Disk Cache Policy: 0
Name:
NoBgi: 0


seqNum: 0x00002227
Time: Thu May 17 04:51:17 2018

Code: 0x000000c3
Class: 1
Locale: 0x08
Event Description: BBU disabled; changing WB virtual disks to WT, Forced WB VDs are not affected
Event Data:
===========
None


seqNum: 0x00002228
Time: Thu May 17 04:51:17 2018

Code: 0x00000036
Class: 0
Locale: 0x01
Event Description: Policy change on VD 00/0 to [ID=00,dcp=01,ccp=00,ap=0,dc=0] from [ID=00,dcp=01,ccp=01,ap=0,dc=0]
Event Data:
===========
Target Id: 0
Previous LD Properties
Access Policy: 0
Current Cache Policy: 1
Default Cache Policy: 1
Disk Cache Policy: 0
Name:
NoBgi: 0
New LD Properties
Access Policy: 0
Current Cache Policy: 0
Default Cache Policy: 1
Disk Cache Policy: 0
Name:
NoBgi: 0


seqNum: 0x00002229
Time: Thu May 17 05:01:02 2018

Code: 0x000000c2
Class: 0
Locale: 0x08
Event Description: BBU enabled; changing WT virtual disks to WB
Event Data:
===========
None


seqNum: 0x0000222a
Time: Thu May 17 05:01:02 2018

Code: 0x00000036
Class: 0
Locale: 0x01
Event Description: Policy change on VD 00/0 to [ID=00,dcp=01,ccp=01,ap=0,dc=0] from [ID=00,dcp=01,ccp=00,ap=0,dc=0]
Event Data:
===========
Target Id: 0
Previous LD Properties
Access Policy: 0
Current Cache Policy: 0
Default Cache Policy: 1
Disk Cache Policy: 0
Name:
NoBgi: 0
New LD Properties
Access Policy: 0
Current Cache Policy: 1
Default Cache Policy: 1
Disk Cache Policy: 0
Name:
NoBgi: 0


seqNum: 0x0000222b
Time: Thu May 17 05:11:52 2018

Code: 0x000000c3
Class: 1
Locale: 0x08
Event Description: BBU disabled; changing WB virtual disks to WT, Forced WB VDs are not affected
Event Data:
===========
None


seqNum: 0x0000222c
Time: Thu May 17 05:11:52 2018

Code: 0x00000036
Class: 0
Locale: 0x01
Event Description: Policy change on VD 00/0 to [ID=00,dcp=01,ccp=00,ap=0,dc=0] from [ID=00,dcp=01,ccp=01,ap=0,dc=0]
Event Data:
===========
Target Id: 0
Previous LD Properties
Access Policy: 0
Current Cache Policy: 1
Default Cache Policy: 1
Disk Cache Policy: 0
Name:
NoBgi: 0
New LD Properties
Access Policy: 0
Current Cache Policy: 0
Default Cache Policy: 1
Disk Cache Policy: 0
Name:
NoBgi: 0


seqNum: 0x0000222d
Time: Thu May 17 05:21:37 2018

Code: 0x000000c2
Class: 0
Locale: 0x08
Event Description: BBU enabled; changing WT virtual disks to WB
Event Data:
===========
None


seqNum: 0x0000222e
Time: Thu May 17 05:21:37 2018

Code: 0x00000036
Class: 0
Locale: 0x01
Event Description: Policy change on VD 00/0 to [ID=00,dcp=01,ccp=01,ap=0,dc=0] from [ID=00,dcp=01,ccp=00,ap=0,dc=0]
Event Data:
===========
Target Id: 0
Previous LD Properties
Access Policy: 0
Current Cache Policy: 0
Default Cache Policy: 1
Disk Cache Policy: 0
Name:
NoBgi: 0
New LD Properties
Access Policy: 0
Current Cache Policy: 1
Default Cache Policy: 1
Disk Cache Policy: 0
Name:
NoBgi: 0

We should give some priority to the failover.

Event Timeline

This is the battery status

BBU status for Adapter: 0

BatteryType: BBU
Battery State: Unknown
  Battery backup charge time : 0 hours

BBU Capacity Info for Adapter: 0

  Relative State of Charge: 17 %
  Absolute State of charge: 0 %
  Remaining Capacity: 94 mAh
  Full Charge Capacity: 581 mAh
  Run time to empty: Battery is not being charged.
  Average time to empty: 8 Min.
  Estimated Time to full recharge: Battery is not being charged.
  Cycle Count: 4
Max Error = 0 %
Remaining Capacity Alarm = 0 mAh
Remining Time Alarm = 0 Min

BBU Design Info for Adapter: 0

  Date of Manufacture: 07/18, 2011
  Design Capacity: 460 mAh
  Design Voltage: 0 mV
  Specification Info: 0
  Serial Number: 0
  Pack Stat Configuration: 0x0000
  Manufacture Name:
  Firmware Version   : 0148 03
  Device Name:
  Device Chemistry:
  Battery FRU: N/A
Module Version = 0148 03
  Transparent Learn = 1
  App Data = 1

BBU Properties for Adapter: 0

  Auto Learn Period: 90 Days
  Next Learn time: None  Learn Delay Interval:0 Hours
  Auto-Learn Mode: Disabled

Exit Code: 0x00

Mentioned in SAL (#wikimedia-operations) [2018-05-17T05:20:07Z] <marostegui> Force BBU learn cycle on db1054 - T194867

After forcing a Re-learn cycle:

˜/icinga-wm 7:54> RECOVERY - MegaRAID on db1054 is OK: OK: optimal, 1 logical, 2 physical, WriteBack policy

I am going to close this as we will not replace the BBU or anything. We will failover the master (T194870) and decommission the host (T186320).
This this task can remain on the system for the record that this host has this issue and won't be fixed.

Vvjjkkii renamed this task from BBU issues on db1054 (s2 primary master) to cucaaaaaaa.Jul 1 2018, 1:09 AM
Vvjjkkii reopened this task as Open.
Vvjjkkii triaged this task as High priority.
Vvjjkkii updated the task description. (Show Details)
Vvjjkkii removed a subscriber: Aklapper.
Wong128hk renamed this task from cucaaaaaaa to BBU issues on db1054 (s2 primary master).Jul 1 2018, 2:28 AM
Wong128hk closed this task as Declined.
Wong128hk raised the priority of this task from High to Needs Triage.
Wong128hk updated the task description. (Show Details)
Wong128hk added a subscriber: Aklapper.