BBU broke: T245621#5897114
Description
Details
Subject | Repo | Branch | Lines +/- | |
---|---|---|---|---|
db1084: Disable notifications | operations/puppet | production | +1 -0 | |
db1084: Disable notifications | operations/puppet | production | +1 -0 |
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | Marostegui | T245621 db1084 crashed due to BBU failure | |||
Resolved | Jclark-ctr | T245647 Replace broken BBU on db1084 (HP host) |
Event Timeline
Looks like BBU died:
Battery/Capacitor Count: 0
/system1/log1/record15 Targets Properties number=15 severity=Caution date=02/19/2020 time=13:19 description=Smart Storage Battery failure (Battery 1, service information: 0x0A). Action: Gather AHS log and contact Support Verbs cd version exit show /system1/log1/record17 Targets Properties number=17 severity=Caution date=02/19/2020 time=13:32 description=POST Error: 313-HPE Smart Storage Battery 1 Failure - Battery Shutdown Event Code: 0x0400. Action: Restart system. Contact HPE support if condition persists.
Change 573288 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] db1084: Disable notifications
Change 573288 merged by Marostegui:
[operations/puppet@production] db1084: Disable notifications
Mentioned in SAL (#wikimedia-operations) [2020-02-19T14:02:43Z] <marostegui> Start mysql on db1084 without replication - T245621
Mentioned in SAL (#wikimedia-operations) [2020-02-19T14:07:19Z] <marostegui> Upgrade and reboot db1084 - T245621
Mentioned in SAL (#wikimedia-operations) [2020-02-19T14:29:31Z] <marostegui> Data checksum on db1084 T245621
@Marostegui - we have a few spare BBUs in the process of being shipped onsite, one of them for T244958, which should be arriving early next week. You can just shoot open a dc-ops task with us, and we can have it taken care of. Thanks, Willy
Data checksum has finished without issues. So I am going to slowly repool this host so it can at least serve some traffic
Mentioned in SAL (#wikimedia-operations) [2020-02-20T06:24:46Z] <marostegui@cumin1001> dbctl commit (dc=all): 'Slowly repool db1084 after crash - T245621', diff saved to https://phabricator.wikimedia.org/P10466 and previous config saved to /var/cache/conftool/dbconfig/20200220-062445-marostegui.json
Mentioned in SAL (#wikimedia-operations) [2020-02-20T09:12:33Z] <marostegui@cumin1001> dbctl commit (dc=all): 'Slowly repool db1084 after crash - T245621', diff saved to https://phabricator.wikimedia.org/P10467 and previous config saved to /var/cache/conftool/dbconfig/20200220-091233-marostegui.json
Mentioned in SAL (#wikimedia-operations) [2020-02-20T10:51:18Z] <marostegui@cumin1001> dbctl commit (dc=all): 'Slowly repool db1084 after crash - T245621', diff saved to https://phabricator.wikimedia.org/P10468 and previous config saved to /var/cache/conftool/dbconfig/20200220-105117-marostegui.json
Change 574923 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] db1084: Disable notifications
Change 574923 merged by Marostegui:
[operations/puppet@production] db1084: Disable notifications
Mentioned in SAL (#wikimedia-operations) [2020-02-27T11:45:44Z] <jynus@cumin1001> dbctl commit (dc=all): 'Repool db1084 at 10% T245621', diff saved to https://phabricator.wikimedia.org/P10538 and previous config saved to /var/cache/conftool/dbconfig/20200227-114542-jynus.json
Mentioned in SAL (#wikimedia-operations) [2020-02-27T15:03:03Z] <jynus@cumin1001> dbctl commit (dc=all): 'Repool db1084 at 50% T245621', diff saved to https://phabricator.wikimedia.org/P10542 and previous config saved to /var/cache/conftool/dbconfig/20200227-150302-jynus.json
I will let @Marostegui put it back to 100% and do the full revert and finishing touches + resolv.
Mentioned in SAL (#wikimedia-operations) [2020-02-28T06:25:37Z] <marostegui@cumin1001> dbctl commit (dc=all): '75% of original weight to db1084 - T245621', diff saved to https://phabricator.wikimedia.org/P10549 and previous config saved to /var/cache/conftool/dbconfig/20200228-062536-marostegui.json
Mentioned in SAL (#wikimedia-operations) [2020-02-28T06:40:37Z] <marostegui@cumin1001> dbctl commit (dc=all): 'Fully repool db1084 - T245621', diff saved to https://phabricator.wikimedia.org/P10550 and previous config saved to /var/cache/conftool/dbconfig/20200228-064037-marostegui.json