Broken BBU
Description
Details
| Subject | Repo | Branch | Lines +/- | |
|---|---|---|---|---|
| db1085: Enable notifications | operations/puppet | production | +0 -1 | |
| db1085: Disable notifications | operations/puppet | production | +2 -0 |
| Status | Subtype | Assigned | Task | ||
|---|---|---|---|---|---|
| Unknown Object (Task) | |||||
| Resolved | Marostegui | T258361 Productionize db1155-db1175 and refresh and decommission db1074-db1095 (22 servers) | |||
| Declined | None | T258386 db1080-95 batch possibly suffering BBU issues | |||
| Resolved | Marostegui | T258360 db1085 crashed |
Event Timeline
BBU issues as expected. This host is also scheduled to be refreshed next Q:
/system1/log1/record16
Targets
Properties
number=16
severity=Caution
date=07/19/2020
time=18:44
description=POST Error: 313-HPE Smart Storage Battery 1 Failure - Battery Shutdown Event Code: 0x0400. Action: Restart system. Contact HPE support if condition persists.
Verbs
cd version exit show
</system1/log1>hpiLO-> show record15
status=0
status_tag=COMMAND COMPLETED
Sun Jul 19 18:48:08 2020
/system1/log1/record15
Targets
Properties
number=15
severity=Critical
date=07/19/2020
time=18:43
description=ASR Detected by System ROM
Verbs
cd version exit show
</system1/log1>hpiLO-> show record14
status=0
status_tag=COMMAND COMPLETED
Sun Jul 19 18:48:11 2020
/system1/log1/record14
Targets
Properties
number=14
severity=Caution
date=07/19/2020
time=18:26
description=Smart Storage Battery failure (Battery 1, service information: 0x0A). Action: Gather AHS log and contact Support
Verbs
cd version exit showPretty much the same issue as T258336
And the BBU is gone:
root@db1085:~# hpssacli controller all show detail | grep -i Battery No-Battery Write Cache: Disabled Battery/Capacitor Count: 0
Change 614590 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] db1085: Disable notifications
Change 614590 merged by Marostegui:
[operations/puppet@production] db1085: Disable notifications
Mentioned in SAL (#wikimedia-operations) [2020-07-19T19:16:10Z] <marostegui> Upgrade and reboot db1085 T258360
Change 615165 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] db1085: Enable notifications
Change 615165 merged by Marostegui:
[operations/puppet@production] db1085: Enable notifications
Mentioned in SAL (#wikimedia-operations) [2020-07-21T10:45:46Z] <marostegui@cumin1001> dbctl commit (dc=all): 'Slowly repool db1085 T258360', diff saved to https://phabricator.wikimedia.org/P11985 and previous config saved to /var/cache/conftool/dbconfig/20200721-104546-marostegui.json
Mentioned in SAL (#wikimedia-operations) [2020-07-21T10:58:52Z] <marostegui@cumin1001> dbctl commit (dc=all): 'Slowly repool db1085 T258360', diff saved to https://phabricator.wikimedia.org/P11986 and previous config saved to /var/cache/conftool/dbconfig/20200721-105852-marostegui.json
Mentioned in SAL (#wikimedia-operations) [2020-07-21T11:08:55Z] <marostegui@cumin1001> dbctl commit (dc=all): 'Fully repool db1085 T258360', diff saved to https://phabricator.wikimedia.org/P11987 and previous config saved to /var/cache/conftool/dbconfig/20200721-110854-marostegui.json