Page MenuHomePhabricator

Fully format and reclone db1246
Closed, ResolvedPublic

Description

db1246 crashed and got its filesystem corrupted it needs the following.

  • Reimage + full format of / and /srv
  • Reclone
  • Repool

There might be still some HW related work as part of T363119, but can be done at a later time.

Event Timeline

Marostegui moved this task from Triage to In progress on the DBA board.

Change #1034489 had a related patch set uploaded (by Kormat; author: Kormat):

[operations/puppet@production] db1226: Switch to puppet 7

https://gerrit.wikimedia.org/r/1034489

Change #1034489 merged by Kormat:

[operations/puppet@production] db1246: Switch to puppet 7

https://gerrit.wikimedia.org/r/1034489

Cookbook cookbooks.sre.hosts.reimage was started by kormat@cumin1002 for host db1246.eqiad.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage started by kormat@cumin1002 for host db1246.eqiad.wmnet with OS bookworm completed:

  • db1246 (WARN)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202405211318_kormat_1510993_db1246.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB

Change #1034178 had a related patch set uploaded (by Kormat; author: Kormat):

[operations/puppet@production] Revert "installserver: Allowing formatting db1246"

https://gerrit.wikimedia.org/r/1034178

Change #1034178 merged by Kormat:

[operations/puppet@production] Revert "installserver: Allowing formatting db1246"

https://gerrit.wikimedia.org/r/1034178

Mentioned in SAL (#wikimedia-operations) [2024-05-21T14:59:25Z] <kormat@cumin1002> dbctl commit (dc=all): 'Depooling db1182 as cloning source T364552', diff saved to https://phabricator.wikimedia.org/P62785 and previous config saved to /var/cache/conftool/dbconfig/20240521-145924-kormat.json

Cloning from db1182 completed successfully.

Mentioned in SAL (#wikimedia-operations) [2024-05-22T10:24:18Z] <kormat@cumin1002> dbctl commit (dc=all): 'db1182 (re)pooling @ 15%: repool clone source T364552', diff saved to https://phabricator.wikimedia.org/P62858 and previous config saved to /var/cache/conftool/dbconfig/20240522-102418-kormat.json

Change #1034871 had a related patch set uploaded (by Kormat; author: Kormat):

[operations/puppet@production] Revert "db1246: Disable notifications"

https://gerrit.wikimedia.org/r/1034871

Change #1034871 abandoned by Kormat:

[operations/puppet@production] Revert "db1246: Disable notifications"

Reason:

Simpler to just make the change directly instead of using revert

https://gerrit.wikimedia.org/r/1034871

Change #1034859 had a related patch set uploaded (by Kormat; author: Kormat):

[operations/puppet@production] db1246: Enable notifications.

https://gerrit.wikimedia.org/r/1034859

Kormat changed the task status from Open to In Progress.Wed, May 22, 10:35 AM
Kormat triaged this task as Medium priority.

Mentioned in SAL (#wikimedia-operations) [2024-05-22T10:39:27Z] <kormat@cumin1002> dbctl commit (dc=all): 'db1182 (re)pooling @ 30%: repool clone source T364552', diff saved to https://phabricator.wikimedia.org/P62860 and previous config saved to /var/cache/conftool/dbconfig/20240522-103924-kormat.json

Mentioned in SAL (#wikimedia-operations) [2024-05-22T10:54:33Z] <kormat@cumin1002> dbctl commit (dc=all): 'db1182 (re)pooling @ 45%: repool clone source T364552', diff saved to https://phabricator.wikimedia.org/P62862 and previous config saved to /var/cache/conftool/dbconfig/20240522-105432-kormat.json

Change #1034859 merged by Kormat:

[operations/puppet@production] db1246: Enable notifications.

https://gerrit.wikimedia.org/r/1034859

Mentioned in SAL (#wikimedia-operations) [2024-05-22T11:09:39Z] <kormat@cumin1002> dbctl commit (dc=all): 'db1182 (re)pooling @ 60%: repool clone source T364552', diff saved to https://phabricator.wikimedia.org/P62864 and previous config saved to /var/cache/conftool/dbconfig/20240522-110938-kormat.json

Mentioned in SAL (#wikimedia-operations) [2024-05-22T11:24:46Z] <kormat@cumin1002> dbctl commit (dc=all): 'db1182 (re)pooling @ 75%: repool clone source T364552', diff saved to https://phabricator.wikimedia.org/P62866 and previous config saved to /var/cache/conftool/dbconfig/20240522-112444-kormat.json

Mentioned in SAL (#wikimedia-operations) [2024-05-22T11:39:52Z] <kormat@cumin1002> dbctl commit (dc=all): 'db1182 (re)pooling @ 90%: repool clone source T364552', diff saved to https://phabricator.wikimedia.org/P62869 and previous config saved to /var/cache/conftool/dbconfig/20240522-113952-kormat.json

Mentioned in SAL (#wikimedia-operations) [2024-05-22T11:54:58Z] <kormat@cumin1002> dbctl commit (dc=all): 'db1182 (re)pooling @ 100%: repool clone source T364552', diff saved to https://phabricator.wikimedia.org/P62872 and previous config saved to /var/cache/conftool/dbconfig/20240522-115458-kormat.json

Mentioned in SAL (#wikimedia-operations) [2024-05-22T11:56:34Z] <kormat@cumin1002> dbctl commit (dc=all): 'db1246 (re)pooling @ 15%: Repool db1246 T364552', diff saved to https://phabricator.wikimedia.org/P62873 and previous config saved to /var/cache/conftool/dbconfig/20240522-115633-kormat.json

Mentioned in SAL (#wikimedia-operations) [2024-05-22T12:11:41Z] <kormat@cumin1002> dbctl commit (dc=all): 'db1246 (re)pooling @ 30%: Repool db1246 T364552', diff saved to https://phabricator.wikimedia.org/P62877 and previous config saved to /var/cache/conftool/dbconfig/20240522-121139-kormat.json

Mentioned in SAL (#wikimedia-operations) [2024-05-22T12:26:48Z] <kormat@cumin1002> dbctl commit (dc=all): 'db1246 (re)pooling @ 45%: Repool db1246 T364552', diff saved to https://phabricator.wikimedia.org/P62881 and previous config saved to /var/cache/conftool/dbconfig/20240522-122647-kormat.json

Mentioned in SAL (#wikimedia-operations) [2024-05-22T12:57:01Z] <kormat@cumin1002> dbctl commit (dc=all): 'db1246 (re)pooling @ 75%: Repool db1246 T364552', diff saved to https://phabricator.wikimedia.org/P62889 and previous config saved to /var/cache/conftool/dbconfig/20240522-125659-kormat.json

Mentioned in SAL (#wikimedia-operations) [2024-05-22T13:12:07Z] <kormat@cumin1002> dbctl commit (dc=all): 'db1246 (re)pooling @ 90%: Repool db1246 T364552', diff saved to https://phabricator.wikimedia.org/P62893 and previous config saved to /var/cache/conftool/dbconfig/20240522-131206-kormat.json

Mentioned in SAL (#wikimedia-operations) [2024-05-22T13:27:13Z] <kormat@cumin1002> dbctl commit (dc=all): 'db1246 (re)pooling @ 100%: Repool db1246 T364552', diff saved to https://phabricator.wikimedia.org/P62898 and previous config saved to /var/cache/conftool/dbconfig/20240522-132712-kormat.json

db1246 fully repooled.