Page MenuHomePhabricator

anaytics1032's BBU is not working correctly
Closed, ResolvedPublic

Description

Analytics1032's alarms about WriteBack/WriteThrough mode are flapping during the past two days, the BBU seems not healthy even after a couple of relearn cycles:

elukey@analytics1032:~$ sudo megacli -AdpBbuCmd -GetBbuStatus -aALL

BBU status for Adapter: 0

BatteryType: BBU
Voltage: 3490 mV
Current: 0 mA
Temperature: 57 C
Battery State: Degraded(Need Attention)
		A manual learn is required.
BBU Firmware Status:

  Charging Status              : None
  Voltage                                 : OK
  Temperature                             : OK
  Learn Cycle Requested	                  : Yes
  Learn Cycle Active                      : No
  Learn Cycle Status                      : OK
  Learn Cycle Timeout                     : No
  I2c Errors Detected                     : No
  Battery Pack Missing                    : No
  Battery Replacement required            : No
  Remaining Capacity Low                  : Yes
  Periodic Learn Required                 : No
  Transparent Learn                       : No
  No space to cache offload               : No
  Pack is about to fail & should be replaced : No
  Cache Offload premium feature required  : No
  Module microcode update required        : No

BBU GasGauge Status: 0x0438
Relative State of Charge: 13 %
Charger Status: Unknown
Remaining Capacity: 65 mAh
Full Charge Capacity: 509 mAh
isSOHGood: Yes

Exit Code: 0x00

Is there by any chance a spare used BBU to swap and see if it works? :)

Event Timeline

elukey triaged this task as Medium priority.May 9 2018, 5:28 AM
elukey created this task.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMay 9 2018, 5:28 AM
elukey added a subscriber: Ottomata.May 9 2018, 5:28 AM
Cmjohnson moved this task from Backlog to Up next on the ops-eqiad board.May 15 2018, 3:46 PM
Cmjohnson moved this task from Up next to Not urgent on the ops-eqiad board.Jun 11 2018, 3:47 PM

@elukey is this still an issue. I do have a spare bbu I can install. If so, please let me know when you would like to schedule this to happen

Hi @Cmjohnson! It is yes, we can try to swap it any time, just give me a 20/30mins heads up to drain the node and shut it down!

@elukey let's do this tomorrow morning. I will ping you when I get to the data center in the morning.

Mentioned in SAL (#wikimedia-operations) [2018-06-28T12:49:10Z] <elukey> stop hadoop daemons on analytics1032 + shutdown to swap BBU -T194234

Looks good!

elukey@analytics1032:~$ sudo megacli -AdpBbuCmd -GetBbuStatus -aALL

BBU status for Adapter: 0

BatteryType: BBU
Voltage: 3966 mV
Current: 161 mA
Temperature: 40 C
Battery State: Optimal
BBU Firmware Status:

  Charging Status              : Charging
  Voltage                                 : OK
  Temperature                             : OK
  Learn Cycle Requested	                  : Yes
  Learn Cycle Active                      : No
  Learn Cycle Status                      : OK
  Learn Cycle Timeout                     : No
  I2c Errors Detected                     : No
  Battery Pack Missing                    : No
  Battery Replacement required            : No
  Remaining Capacity Low                  : No
  Periodic Learn Required                 : No
  Transparent Learn                       : Yes
  No space to cache offload               : No
  Pack is about to fail & should be replaced : No
  Cache Offload premium feature required  : No
  Module microcode update required        : No

BBU GasGauge Status: 0x0128
Relative State of Charge: 94 %
Charger Status: In Progress
Remaining Capacity: 409 mAh
Full Charge Capacity: 439 mAh
isSOHGood: Yes

Exit Code: 0x00
elukey closed this task as Resolved.Jun 28 2018, 1:11 PM
Vvjjkkii renamed this task from anaytics1032's BBU is not working correctly to xbdaaaaaaa.Jul 1 2018, 1:11 AM
Vvjjkkii reopened this task as Open.
Vvjjkkii raised the priority of this task from Medium to High.
Vvjjkkii updated the task description. (Show Details)
Vvjjkkii removed a subscriber: Aklapper.
CommunityTechBot renamed this task from xbdaaaaaaa to anaytics1032's BBU is not working correctly.Jul 2 2018, 6:10 AM
CommunityTechBot closed this task as Resolved.
CommunityTechBot claimed this task.
CommunityTechBot lowered the priority of this task from High to Medium.
CommunityTechBot updated the task description. (Show Details)
CommunityTechBot added a subscriber: Aklapper.