Page MenuHomePhabricator

Provision new RESTBase cluster nodes: restbase20[28-35]
Closed, ResolvedPublic

Description

Provision new Cassandra hosts (as replacements/refreshes for restbase20[13-20]).

  • restbase2028
  • restbase2029
  • restbase2030
  • restbase2031
  • restbase2032
  • restbase2033
  • restbase2034
  • restbase2035

See also: T349758: Q1:rack/setup/install restbase20[28-35]

Event Timeline

Eevans triaged this task as Medium priority.Nov 30 2023, 7:01 PM

Host rebooted by eevans@cumin1001 with reason: Apply new network settings

Change 979161 had a related patch set uploaded (by Eevans; author: Eevans):

[operations/puppet@production] restbase: set production role and add config for restbase2028

https://gerrit.wikimedia.org/r/979161

Change 979161 merged by Eevans:

[operations/puppet@production] restbase: set production role and add config for restbase2028

https://gerrit.wikimedia.org/r/979161

Change 980049 had a related patch set uploaded (by Eevans; author: Eevans):

[operations/puppet@production] restbase: migrate restbase2028 to puppet 7

https://gerrit.wikimedia.org/r/980049

Change 980049 merged by Eevans:

[operations/puppet@production] restbase: migrate restbase2028 to puppet 7

https://gerrit.wikimedia.org/r/980049

Change 980884 had a related patch set uploaded (by Eevans; author: Eevans):

[operations/puppet@production] restbase: set production role and add config for restbase2029

https://gerrit.wikimedia.org/r/980884

Change 980884 merged by Eevans:

[operations/puppet@production] restbase: set production role and add config for restbase2029

https://gerrit.wikimedia.org/r/980884

Mentioned in SAL (#wikimedia-operations) [2023-12-06T16:29:11Z] <urandom> bootstrapping Cassandra/restbase2020-a — T352468

Change 981371 had a related patch set uploaded (by Eevans; author: Eevans):

[operations/puppet@production] restbase: set production role and add config for restbase2030

https://gerrit.wikimedia.org/r/981371

Change 981371 merged by Eevans:

[operations/puppet@production] restbase: set production role and add config for restbase2030

https://gerrit.wikimedia.org/r/981371

Mentioned in SAL (#wikimedia-operations) [2023-12-07T20:05:51Z] <urandom> bootstrap Cassandra/restbase2030-a — T352468

Change 981601 had a related patch set uploaded (by Eevans; author: Eevans):

[labs/private@master] keys & certs for missing restbase nodes

https://gerrit.wikimedia.org/r/981601

Change 981605 had a related patch set uploaded (by Eevans; author: Eevans):

[operations/puppet@production] restbase: set production role and add config for restbase2031

https://gerrit.wikimedia.org/r/981605

Change 981606 had a related patch set uploaded (by Eevans; author: Eevans):

[operations/puppet@production] restbase: set production role and add config for restbase2032

https://gerrit.wikimedia.org/r/981606

Change 981607 had a related patch set uploaded (by Eevans; author: Eevans):

[operations/puppet@production] restbase: set production role and add config for restbase2033

https://gerrit.wikimedia.org/r/981607

Change 981608 had a related patch set uploaded (by Eevans; author: Eevans):

[operations/puppet@production] restbase: set production role and add config for restbase2034

https://gerrit.wikimedia.org/r/981608

Change 981609 had a related patch set uploaded (by Eevans; author: Eevans):

[operations/puppet@production] restbase: set production role and add config for restbase2035

https://gerrit.wikimedia.org/r/981609

Change 981605 merged by Eevans:

[operations/puppet@production] restbase: set production role and add config for restbase2031

https://gerrit.wikimedia.org/r/981605

Change 981606 merged by Eevans:

[operations/puppet@production] restbase: set production role and add config for restbase2032

https://gerrit.wikimedia.org/r/981606

restbase2032 was erroneously bootstrapped into row B (should be row C), and will have to be decommissioned (and re-bootstrapped). :(

Change 983431 had a related patch set uploaded (by Eevans; author: Eevans):

[operations/puppet@production] restbase2032: fix erroneous row

https://gerrit.wikimedia.org/r/983431

Change 983431 merged by Eevans:

[operations/puppet@production] restbase2032: fix erroneous row

https://gerrit.wikimedia.org/r/983431

Change 984204 had a related patch set uploaded (by Alexandros Kosiaris; author: Alexandros Kosiaris):

[operations/puppet@production] Fix insetup role for restbase203[3-5]

https://gerrit.wikimedia.org/r/984204

Change 984204 merged by Alexandros Kosiaris:

[operations/puppet@production] Fix insetup role for restbase203[3-5]

https://gerrit.wikimedia.org/r/984204

Change 981607 merged by Eevans:

[operations/puppet@production] restbase: set production role and add config for restbase2033

https://gerrit.wikimedia.org/r/981607

Cookbook cookbooks.sre.hosts.reimage was started by eevans@cumin1002 for host restbase2033.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by eevans@cumin1002 for host restbase2033.codfw.wmnet with OS bullseye executed with errors:

  • restbase2033 (FAIL)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202312201951_eevans_1186823_restbase2033.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • The reimage failed, see the cookbook logs for the details

Cookbook cookbooks.sre.hosts.reimage was started by eevans@cumin1002 for host restbase2033.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by eevans@cumin1002 for host restbase2033.codfw.wmnet with OS bullseye executed with errors:

  • restbase2033 (FAIL)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • The reimage failed, see the cookbook logs for the details

Cookbook cookbooks.sre.hosts.reimage was started by eevans@cumin1002 for host restbase2033.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by eevans@cumin1002 for host restbase2033.codfw.wmnet with OS bullseye executed with errors:

  • restbase2033 (FAIL)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202312202108_eevans_1197116_restbase2033.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • The reimage failed, see the cookbook logs for the details

Cookbook cookbooks.sre.hosts.reimage was started by eevans@cumin1002 for host restbase2033.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by eevans@cumin1002 for host restbase2033.codfw.wmnet with OS bullseye completed:

  • restbase2033 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202312202148_eevans_1204161_restbase2033.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Change 984647 had a related patch set uploaded (by Eevans; author: Eevans):

[operations/puppet@production] restbase: set production role and add config for restbase2033

https://gerrit.wikimedia.org/r/984647

Change 984647 merged by Eevans:

[operations/puppet@production] restbase: set production role and add config for restbase2033

https://gerrit.wikimedia.org/r/984647

Change 981608 merged by Eevans:

[operations/puppet@production] restbase: set production role and add config for restbase2034

https://gerrit.wikimedia.org/r/981608

Change 981601 merged by Eevans:

[labs/private@master] restbase: add missing keys & certs, remove obsolete

https://gerrit.wikimedia.org/r/981601

Change 981609 merged by Eevans:

[operations/puppet@production] restbase: set production role and add config for restbase2035

https://gerrit.wikimedia.org/r/981609

Change 989248 had a related patch set uploaded (by Eevans; author: Eevans):

[operations/puppet@production] restbase: configure new hosts for partition reuse

https://gerrit.wikimedia.org/r/989248

Change 989248 merged by Eevans:

[operations/puppet@production] restbase: configure new hosts for partition reuse

https://gerrit.wikimedia.org/r/989248