Page MenuHomePhabricator

Bring db1127 back into service
Closed, ResolvedPublic

Description

Bad RAM has been replaced, the host can now be brought back into service.

Event Timeline

LSobanski triaged this task as Medium priority.Oct 11 2021, 10:28 AM
LSobanski moved this task from Triage to Ready on the DBA board.

As db1127 had mariadb crash from ram errors, and also the machine was unexpectedly powered off, i'm restoring the db data from an s7 snapshot.

Snapshot restored, catching up on replication now.

Change 730146 had a related patch set uploaded (by Kormat; author: Kormat):

[operations/puppet@production] db1127: Re-enable notifications

https://gerrit.wikimedia.org/r/730146

Change 730146 merged by Kormat:

[operations/puppet@production] db1127: Re-enable notifications

https://gerrit.wikimedia.org/r/730146

Mentioned in SAL (#wikimedia-operations) [2021-10-12T08:31:04Z] <kormat@cumin1001> dbctl commit (dc=all): 'db1127 (re)pooling @ 25%: repool db1127 T292956', diff saved to https://phabricator.wikimedia.org/P17453 and previous config saved to /var/cache/conftool/dbconfig/20211012-083103-kormat.json

Mentioned in SAL (#wikimedia-operations) [2021-10-12T08:46:07Z] <kormat@cumin1001> dbctl commit (dc=all): 'db1127 (re)pooling @ 50%: repool db1127 T292956', diff saved to https://phabricator.wikimedia.org/P17454 and previous config saved to /var/cache/conftool/dbconfig/20211012-084607-kormat.json

Mentioned in SAL (#wikimedia-operations) [2021-10-12T09:01:11Z] <kormat@cumin1001> dbctl commit (dc=all): 'db1127 (re)pooling @ 75%: repool db1127 T292956', diff saved to https://phabricator.wikimedia.org/P17455 and previous config saved to /var/cache/conftool/dbconfig/20211012-090111-kormat.json

Mentioned in SAL (#wikimedia-operations) [2021-10-12T09:16:15Z] <kormat@cumin1001> dbctl commit (dc=all): 'db1127 (re)pooling @ 100%: repool db1127 T292956', diff saved to https://phabricator.wikimedia.org/P17456 and previous config saved to /var/cache/conftool/dbconfig/20211012-091614-kormat.json