Page MenuHomePhabricator

analytics1057's BBU is faulty
Closed, ResolvedPublic

Description

elukey@analytics1057:~$ sudo megacli -AdpBbuCmd -aALL

BBU status for Adapter: 0

BatteryType: BBU
Battery State: Unknown
  Battery backup charge time : 0 hours

BBU Capacity Info for Adapter: 0

  Relative State of Charge: 17 %
  Absolute State of charge: 0 %
  Remaining Capacity: 91 mAh
  Full Charge Capacity: 559 mAh
  Run time to empty: Battery is not being charged.
  Average time to empty: 7 Min.
  Estimated Time to full recharge: Battery is not being charged.
  Cycle Count: 4
Max Error = 0 %
Remaining Capacity Alarm = 0 mAh
Remining Time Alarm = 0 Min

BBU Design Info for Adapter: 0

  Date of Manufacture: 00/00, 0000
  Design Capacity: 460 mAh
  Design Voltage: 0 mV
  Specification Info: 0
  Serial Number: 0
  Pack Stat Configuration: 0x0000
  Manufacture Name: 0x113
  Firmware Version   : 0.3
  Device Name:
  Device Chemistry:
  Battery FRU: N/A
Module Version = 0.3
  Transparent Learn = 1
  App Data = 1

BBU Properties for Adapter: 0

  Auto Learn Period: 90 Days
  Next Learn time: None  Learn Delay Interval:0 Hours
  Auto-Learn Mode: Disabled

Exit Code: 0x00

Tried to force a re-learn but it didn't really work. Icinga keep alerting for:

CRITICAL: 13 LD(s) must have write cache policy WriteBack, currently using: WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough

The host should be OOW, so if we have a spare BBU around it would be really great, otherwise we'll set a different cache policy for this host until we refresh it :)

Details

Related Gerrit Patches:

Related Objects

Event Timeline

elukey created this task.Nov 25 2019, 7:10 AM
mforns triaged this task as High priority.Nov 25 2019, 5:04 PM
mforns moved this task from Incoming to Operational Excellence on the Analytics board.

@elukey No spare bbu around

@elukey No spare bbu around

@Jclark-ctr hi! In https://phabricator.wikimedia.org/T233080 analytics1032 needs to be decommed, maybe we can try its BBU?

@elukey unsure if this is same bbu it is diffrent models. 720xd vs 730xd

Change 555985 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] Add overrides to analytics1057 for raid check policy

https://gerrit.wikimedia.org/r/555985

Change 555985 merged by Elukey:
[operations/puppet@production] Add overrides to analytics1057 for raid check policy

https://gerrit.wikimedia.org/r/555985

elukey closed this task as Resolved.Dec 10 2019, 2:20 PM

I have set puppet to check for WriteThrough, not WriteBack, so alarms will go away. This host will be refreshed during the next months.

Mentioned in SAL (#wikimedia-operations) [2019-12-18T09:24:52Z] <elukey> execute 'megacli -LDSetProp WT -LAll -aAll' on analytics1057 - T239045