db2088 crashed on Saturday:
12:55:21 <+icinga-wm> PROBLEM - Host db2088 is DOWN: PING CRITICAL - Packet loss = 100%
db2088 crashed on Saturday:
12:55:21 <+icinga-wm> PROBLEM - Host db2088 is DOWN: PING CRITICAL - Packet loss = 100%
Subject | Repo | Branch | Lines +/- | |
---|---|---|---|---|
db2088: Disable notifications | operations/puppet | production | +1 -0 |
Change 801171 had a related patch set uploaded (by Marostegui; author: Marostegui):
[operations/puppet@production] db2088: Disable notifications
Change 801171 merged by Marostegui:
[operations/puppet@production] db2088: Disable notifications
Mentioned in SAL (#wikimedia-operations) [2022-05-30T05:35:00Z] <marostegui@cumin1001> dbctl commit (dc=all): 'Depool db2088 (s1 and s2) T309485', diff saved to https://phabricator.wikimedia.org/P28913 and previous config saved to /var/cache/conftool/dbconfig/20220530-053459-marostegui.json
@Papaul db2088's mgmt interface is also unavailable so I cannot check the logs and/or if the host is up and the network failed.
Can you check on-site?
Thank you!
I removed the power for 10 minutes, the server came backup. IDRAC log not showing any HW issues. I upgrade the BIOS and IDRAC on the node. The server is back up.
Thanks Papaul. I can indeed access the host now.
MySQL seems to be fine.
I am going to repool this host once it catches up and close this. If it happens again, we can probably decommission it as it is scheduled for refresh and the replacement hardware has been ordered and will arrive in a few months. We'll probably not have a DC switchover before that anyways.
Mentioned in SAL (#wikimedia-operations) [2022-06-02T05:14:52Z] <marostegui@cumin1001> dbctl commit (dc=all): 'Repool db2088 (s1 and s2) T309485', diff saved to https://phabricator.wikimedia.org/P29327 and previous config saved to /var/cache/conftool/dbconfig/20220602-051451-marostegui.json
db2088 is back in sync with both s1 and s2 master. I have repooled it. Closing this for now. If it happens again we should probably just decommission it.
Thank you Papaul!