Page MenuHomePhabricator

Migrate production Elastic clusters to Opensearch
Open, In Progress, MediumPublic

Description

As we have successfully migrated our smaller clusters from Elastic -> OpenSearch (see T380752 and T387904), the time has come to migrate our production clusters from Elastic to OpenSearch.

Current status

Progress by number of hosts (updated at end of shift):
overall: 53/116 hosts migrated
codfw: 53/61 hosts migrated

  • by row (non-masters only, we will do masters last):
    • A
    • B
    • C in progress
    • D
    • currently broken:
      • cirrussearch2078 - continually failing PXE, ref T392644
      • cirrussearch2091 - continually failing PXE and has history of hardware issues (ref T391639)

eqiad: 0/55 hosts migrated

Open patches
See the "elastic-2-opensearch" Gerrit topic

Checking progress manually

You can check site.pp in the Puppet repo . See how many hosts are using the old elasticsearch::cirrus role vs. the new cirrus::opensearch role

If you have cumin access, you can also run the following commands: sudo cumin O:elasticsearch::cirrus for the old role, sudo cumin O:cirrus::opensearch for the new role

Creating this ticket to:

  • Create a plan. Documented here
  • Notify stakeholders
  • Migrate each host from Elastic to Opensearch and confirm operation
    • also rename like so: s/elastic/cirrussearch
  • Update docs
  • Roll back any config we temporarily changed for the migration T391350

Details

Other Assignee
RKemper
SubjectRepoBranchLines +/-
operations/puppetproduction+273 -1
operations/puppetproduction+22 -2
operations/puppetproduction+0 -12
operations/puppetproduction+0 -1
operations/puppetproduction+0 -3
operations/puppetproduction+0 -1
operations/puppetproduction+0 -1
operations/puppetproduction+2 -0
operations/puppetproduction+651 -651
operations/puppetproduction+1 -1
operations/puppetproduction+16 -6
operations/puppetproduction+6 -5
operations/puppetproduction+4 -5
operations/cookbooksmaster+14 -3
operations/puppetproduction+4 -4
operations/puppetproduction+12 -10
operations/puppetproduction+28 -2
operations/puppetproduction+1 -1
operations/puppetproduction+8 -8
operations/puppetproduction+4 -0
operations/puppetproduction+8 -7
operations/puppetproduction+3 -6
operations/puppetproduction+0 -11
operations/puppetproduction+1 -0
operations/puppetproduction+1 -1
operations/puppetproduction+1 -1
operations/puppetproduction+1 -1
operations/puppetproduction+1 -1
operations/puppetproduction+6 -1
operations/puppetproduction+5 -4
operations/puppetproduction+13 -1
operations/puppetproduction+1 -5
operations/puppetproduction+40 -40
operations/puppetproduction+5 -1
operations/puppetproduction+1 -1
operations/puppetproduction+0 -11
operations/puppetproduction+5 -7
operations/puppetproduction+43 -43
operations/cookbooksmaster+11 -2
operations/puppetproduction+1 -0
operations/puppetproduction+19 -5
operations/puppetproduction+1 -1
operations/puppetproduction+94 -0
operations/puppetproduction+2 -0
operations/puppetproduction+16 -18
operations/puppetproduction+13 -16
operations/puppetproduction+3 -3
operations/puppetproduction+6 -6
operations/puppetproduction+4 -6
operations/puppetproduction+1 -1
operations/puppetproduction+2 -2
operations/puppetproduction+12 -11
operations/puppetproduction+2 -2
operations/puppetproduction+2 -2
operations/puppetproduction+0 -0
operations/puppetproduction+1 -1
operations/puppetproduction+1 -0
operations/puppetproduction+1 -1
operations/puppetproduction+0 -0
operations/puppetproduction+0 -1
operations/puppetproduction+6 -9
operations/mediawiki-configmaster+3 -3
operations/mediawiki-configmaster+3 -3
operations/puppetproduction+101 -6
operations/puppetproduction+44 -0
operations/puppetproduction+87 -0
operations/mediawiki-configmaster+2 -2
operations/puppetproduction+6 -6
operations/puppetproduction+5 -0
operations/puppetproduction+47 -5
operations/mediawiki-configmaster+1 -1
operations/puppetproduction+10 -0
operations/puppetproduction+462 -0
Show related patches Customize query in gerrit

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Cookbook cookbooks.sre.hosts.rename started by bking@cumin2002 from elastic2101 to cirrussearch2101 completed:

  • elastic2101 (PASS)
    • ✔️ Downtimed host on Icinga/Alertmanager
    • ✔️ Disabled puppet and its timer
    • ✔️ Disabled debmonitor-client timer
    • ✔️ Netbox updated
    • ✔️ BMC Hostname updated
    • ✔️ DNS updated
    • ✔️ Switch description updated
    • ✔️ Removed from DebMonitor
    • ✔️ Removed from Puppet master and PuppetDB
    • Rename completed 👍 - now please run the re-image cookbook on the new name with --new

Cookbook cookbooks.sre.hosts.reimage was started by bking@cumin2002 for host cirrussearch2101.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.rename started by bking@cumin2002 from elastic2102 to cirrussearch2102 completed:

  • elastic2102 (PASS)
    • ✔️ Downtimed host on Icinga/Alertmanager
    • ✔️ Disabled puppet and its timer
    • ✔️ Disabled debmonitor-client timer
    • ✔️ Netbox updated
    • ✔️ BMC Hostname updated
    • ✔️ DNS updated
    • ✔️ Switch description updated
    • ✔️ Removed from DebMonitor
    • ✔️ Removed from Puppet master and PuppetDB
    • Rename completed 👍 - now please run the re-image cookbook on the new name with --new

Cookbook cookbooks.sre.hosts.reimage was started by bking@cumin2002 for host cirrussearch2102.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by bking@cumin2002 for host cirrussearch2101.codfw.wmnet with OS bullseye completed:

  • cirrussearch2101 (PASS)
    • Host successfully migrated to the new VLAN
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata (7) to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202504231547_bking_2035727_cirrussearch2101.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Change #1138406 had a related patch set uploaded (by Bking; author: Bking):

[operations/puppet@production] cirrussearch: add more newly-reimaged hosts to conftool

https://gerrit.wikimedia.org/r/1138406

Cookbook cookbooks.sre.hosts.reimage started by bking@cumin2002 for host cirrussearch2102.codfw.wmnet with OS bullseye completed:

  • cirrussearch2102 (PASS)
    • Host successfully migrated to the new VLAN
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata (7) to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202504231609_bking_2087982_cirrussearch2102.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.rename started by bking@cumin2002 from elastic2113 to cirrussearch2113 completed:

  • elastic2113 (PASS)
    • ✔️ Downtimed host on Icinga/Alertmanager
    • ✔️ Disabled puppet and its timer
    • ✔️ Disabled debmonitor-client timer
    • ✔️ Netbox updated
    • ✔️ BMC Hostname updated
    • ✔️ DNS updated
    • ✔️ Switch description updated
    • ✔️ Removed from DebMonitor
    • ✔️ Removed from Puppet master and PuppetDB
    • Rename completed 👍 - now please run the re-image cookbook on the new name with --new

Cookbook cookbooks.sre.hosts.reimage was started by bking@cumin2002 for host cirrussearch2113.codfw.wmnet with OS bullseye

Change #1138406 merged by Bking:

[operations/puppet@production] cirrussearch: add more newly-reimaged hosts to conftool

https://gerrit.wikimedia.org/r/1138406

Cookbook cookbooks.sre.hosts.reimage started by bking@cumin2002 for host cirrussearch2113.codfw.wmnet with OS bullseye executed with errors:

  • cirrussearch2113 (FAIL)
    • Host successfully migrated to the new VLAN
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata (7) to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • The reimage failed, see the cookbook logs for the details. You can also try typing "sudo install-console cirrussearch2113.codfw.wmnet" to get a root shell, but depending on the failure this may not work.

Change #1138446 had a related patch set uploaded (by Bking; author: Bking):

[operations/puppet@production] cirrussearch: Add new master-eligibles

https://gerrit.wikimedia.org/r/1138446

Change #1137069 merged by Bking:

[operations/puppet@production] cirrussearch: prepare for eqiad migration

https://gerrit.wikimedia.org/r/1137069

Change #1138449 had a related patch set uploaded (by Bking; author: Bking):

[operations/puppet@production] cirrussearch: fix typo in regex.yaml

https://gerrit.wikimedia.org/r/1138449

Change #1138449 merged by Bking:

[operations/puppet@production] cirrussearch: fix typo in regex.yaml

https://gerrit.wikimedia.org/r/1138449

Change #1138446 merged by Ryan Kemper:

[operations/puppet@production] cirrussearch: Change whitespace from 4 to 2

https://gerrit.wikimedia.org/r/1138446

Change #1138479 had a related patch set uploaded (by Ryan Kemper; author: Ryan Kemper):

[operations/puppet@production] cirrus: migrate elastic2061->cirrussearch2061

https://gerrit.wikimedia.org/r/1138479

Change #1138479 merged by Ryan Kemper:

[operations/puppet@production] cirrus: migrate elastic2061->cirrussearch2061

https://gerrit.wikimedia.org/r/1138479

Cookbook cookbooks.sre.hosts.rename started by bking@cumin2002 from elastic2061 to cirrussearch2061 completed:

  • elastic2061 (PASS)
    • ✔️ Downtimed host on Icinga/Alertmanager
    • ✔️ Disabled puppet and its timer
    • ✔️ Disabled debmonitor-client timer
    • ✔️ Netbox updated
    • ✔️ BMC Hostname updated
    • ✔️ DNS updated
    • ✔️ Switch description updated
    • ✔️ Removed from DebMonitor
    • ✔️ Removed from Puppet master and PuppetDB
    • Rename completed 👍 - now please run the re-image cookbook on the new name with --new

Cookbook cookbooks.sre.hosts.reimage was started by bking@cumin2002 for host cirrussearch2061.codfw.wmnet with OS bullseye

Change #1138489 had a related patch set uploaded (by Ryan Kemper; author: Ryan Kemper):

[operations/puppet@production] cirrus: add to-be-renamed masters

https://gerrit.wikimedia.org/r/1138489

Cookbook cookbooks.sre.hosts.reimage started by bking@cumin2002 for host cirrussearch2061.codfw.wmnet with OS bullseye completed:

  • cirrussearch2061 (WARN)
    • Host successfully migrated to the new VLAN
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata (7) to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202504232221_bking_2471203_cirrussearch2061.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB

Change #1138687 had a related patch set uploaded (by Vgutierrez; author: Vgutierrez):

[operations/puppet@production] conftool: Remove mentions to elastic2064

https://gerrit.wikimedia.org/r/1138687

Change #1138687 merged by Vgutierrez:

[operations/puppet@production] conftool: Remove mentions to elastic2064

https://gerrit.wikimedia.org/r/1138687

Change #1138693 had a related patch set uploaded (by Vgutierrez; author: Vgutierrez):

[operations/puppet@production] conftool: Remove mentions to elastic2094

https://gerrit.wikimedia.org/r/1138693

Change #1138693 merged by Vgutierrez:

[operations/puppet@production] conftool: Remove mentions to elastic2094

https://gerrit.wikimedia.org/r/1138693

Change #1138695 had a related patch set uploaded (by Vgutierrez; author: Vgutierrez):

[operations/puppet@production] conftool: Remove no longer existent elastic hosts

https://gerrit.wikimedia.org/r/1138695

Change #1138695 merged by Vgutierrez:

[operations/puppet@production] conftool: Remove no longer existent elastic hosts

https://gerrit.wikimedia.org/r/1138695

Change #1138704 had a related patch set uploaded (by Vgutierrez; author: Vgutierrez):

[operations/puppet@production] conftool: Remove mentions to elastic2095

https://gerrit.wikimedia.org/r/1138704

Change #1138704 merged by Vgutierrez:

[operations/puppet@production] conftool: Remove mentions to elastic2095

https://gerrit.wikimedia.org/r/1138704

Change #1138804 had a related patch set uploaded (by Bking; author: Bking):

[operations/puppet@production] cirrussearch: remove remaining elastic hosts

https://gerrit.wikimedia.org/r/1138804

Change #1138804 merged by Bking:

[operations/puppet@production] cirrussearch: remove remaining elastic hosts

https://gerrit.wikimedia.org/r/1138804

Change #1138489 merged by Bking:

[operations/puppet@production] cirrus: add to-be-renamed masters

https://gerrit.wikimedia.org/r/1138489

Cookbook cookbooks.sre.hosts.rename started by bking@cumin2002 from elastic2073 to cirrussearch2073 completed:

  • elastic2073 (PASS)
    • ✔️ Downtimed host on Icinga/Alertmanager
    • ✔️ Disabled puppet and its timer
    • ✔️ Disabled debmonitor-client timer
    • ✔️ Netbox updated
    • ✔️ BMC Hostname updated
    • ✔️ DNS updated
    • ✔️ Switch description updated
    • ✔️ Removed from DebMonitor
    • ✔️ Removed from Puppet master and PuppetDB
    • Rename completed 👍 - now please run the re-image cookbook on the new name with --new

Cookbook cookbooks.sre.hosts.reimage was started by bking@cumin2002 for host cirrussearch2073.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by bking@cumin2002 for host cirrussearch2073.codfw.wmnet with OS bullseye completed:

  • cirrussearch2073 (PASS)
    • Host successfully migrated to the new VLAN
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata (7) to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202504241732_bking_3649904_cirrussearch2073.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.rename started by bking@cumin2002 from elastic2076 to cirrussearch2076 completed:

  • elastic2076 (PASS)
    • ✔️ Downtimed host on Icinga/Alertmanager
    • ✔️ Disabled puppet and its timer
    • ✔️ Disabled debmonitor-client timer
    • ✔️ Netbox updated
    • ✔️ BMC Hostname updated
    • ✔️ DNS updated
    • ✔️ Switch description updated
    • ✔️ Removed from DebMonitor
    • ✔️ Removed from Puppet master and PuppetDB
    • Rename completed 👍 - now please run the re-image cookbook on the new name with --new

Cookbook cookbooks.sre.hosts.reimage was started by bking@cumin2002 for host cirrussearch2076.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by bking@cumin2002 for host cirrussearch2076.codfw.wmnet with OS bullseye completed:

  • cirrussearch2076 (WARN)
    • Host successfully migrated to the new VLAN
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata (7) to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202504241955_bking_3795261_cirrussearch2076.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage was started by bking@cumin2002 for host cirrussearch2078.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by bking@cumin2002 for host cirrussearch2078.codfw.wmnet with OS bullseye executed with errors:

  • cirrussearch2078 (FAIL)
    • Host successfully migrated to the new VLAN
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced UEFI HTTP Boot for next reboot
    • Host rebooted via Redfish
    • The reimage failed, see the cookbook logs for the details. You can also try typing "sudo install-console cirrussearch2078.codfw.wmnet" to get a root shell, but depending on the failure this may not work.

Change #1138925 had a related patch set uploaded (by Bking; author: Bking):

[operations/puppet@production] cirrussearch: allow any cirrussearch host to join cluster

https://gerrit.wikimedia.org/r/1138925

Cookbook cookbooks.sre.hosts.rename started by bking@cumin2002 from elastic2080 to cirrussearch2080 completed:

  • elastic2080 (PASS)
    • ✔️ Downtimed host on Icinga/Alertmanager
    • ✔️ Disabled puppet and its timer
    • ✔️ Disabled debmonitor-client timer
    • ✔️ Netbox updated
    • ✔️ BMC Hostname updated
    • ✔️ DNS updated
    • ✔️ Switch description updated
    • ✔️ Removed from DebMonitor
    • ✔️ Removed from Puppet master and PuppetDB
    • Rename completed 👍 - now please run the re-image cookbook on the new name with --new

Cookbook cookbooks.sre.hosts.reimage was started by bking@cumin2002 for host cirrussearch2080.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by bking@cumin2002 for host cirrussearch2080.codfw.wmnet with OS bullseye executed with errors:

  • cirrussearch2080 (FAIL)
    • Failed to migrate host to the new VLAN, sre.hosts.move-vlan cookbook returned 94
    • The reimage failed, see the cookbook logs for the details. You can also try typing "sudo install-console cirrussearch2080.codfw.wmnet" to get a root shell, but depending on the failure this may not work.

Cookbook cookbooks.sre.hosts.reimage was started by bking@cumin2002 for host cirrussearch2080.codfw.wmnet with OS bullseye

Change #1138925 merged by Bking:

[operations/puppet@production] cirrussearch: allow any cirrussearch host to join cluster

https://gerrit.wikimedia.org/r/1138925

Cookbook cookbooks.sre.hosts.reimage was started by bking@cumin2002 for host cirrussearch2078.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by bking@cumin2002 for host cirrussearch2078.codfw.wmnet with OS bullseye executed with errors:

  • cirrussearch2078 (FAIL)
    • Host successfully migrated to the new VLAN
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • The reimage failed, see the cookbook logs for the details. You can also try typing "sudo install-console cirrussearch2078.codfw.wmnet" to get a root shell, but depending on the failure this may not work.

Cookbook cookbooks.sre.hosts.reimage was started by bking@cumin2002 for host cirrussearch2078.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by bking@cumin2002 for host cirrussearch2080.codfw.wmnet with OS bullseye completed:

  • cirrussearch2080 (WARN)
    • Host successfully migrated to the new VLAN
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata (7) to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202504242207_bking_3934146_cirrussearch2080.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage started by bking@cumin2002 for host cirrussearch2078.codfw.wmnet with OS bullseye executed with errors:

  • cirrussearch2078 (FAIL)
    • Host successfully migrated to the new VLAN
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • The reimage failed, see the cookbook logs for the details. You can also try typing "sudo install-console cirrussearch2078.codfw.wmnet" to get a root shell, but depending on the failure this may not work.

Cookbook cookbooks.sre.hosts.rename started by bking@cumin2002 from elastic2081 to cirrussearch2081 completed:

  • elastic2081 (PASS)
    • ✔️ Downtimed host on Icinga/Alertmanager
    • ✔️ Disabled puppet and its timer
    • ✔️ Disabled debmonitor-client timer
    • ✔️ Netbox updated
    • ✔️ BMC Hostname updated
    • ✔️ DNS updated
    • ✔️ Switch description updated
    • ✔️ Removed from DebMonitor
    • ✔️ Removed from Puppet master and PuppetDB
    • Rename completed 👍 - now please run the re-image cookbook on the new name with --new

Cookbook cookbooks.sre.hosts.reimage was started by bking@cumin2002 for host cirrussearch2081.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by bking@cumin2002 for host cirrussearch2081.codfw.wmnet with OS bullseye completed:

  • cirrussearch2081 (PASS)
    • Host successfully migrated to the new VLAN
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata (7) to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202504251429_bking_724557_cirrussearch2081.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage was started by bking@cumin2002 for host cirrussearch2078.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.rename started by bking@cumin2002 from elastic2083 to cirrussearch2083 completed:

  • elastic2083 (PASS)
    • ✔️ Downtimed host on Icinga/Alertmanager
    • ✔️ Disabled puppet and its timer
    • ✔️ Disabled debmonitor-client timer
    • ✔️ Netbox updated
    • ✔️ BMC Hostname updated
    • ✔️ DNS updated
    • ✔️ Switch description updated
    • ✔️ Removed from DebMonitor
    • ✔️ Removed from Puppet master and PuppetDB
    • Rename completed 👍 - now please run the re-image cookbook on the new name with --new

Cookbook cookbooks.sre.hosts.reimage was started by bking@cumin2002 for host cirrussearch2083.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by bking@cumin2002 for host cirrussearch2083.codfw.wmnet with OS bullseye completed:

  • cirrussearch2083 (WARN)
    • Host successfully migrated to the new VLAN
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata (7) to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202504251626_bking_866677_cirrussearch2083.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage started by bking@cumin2002 for host cirrussearch2078.codfw.wmnet with OS bullseye executed with errors:

  • cirrussearch2078 (FAIL)
    • Host successfully migrated to the new VLAN
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • The reimage failed, see the cookbook logs for the details. You can also try typing "sudo install-console cirrussearch2078.codfw.wmnet" to get a root shell, but depending on the failure this may not work.

Cookbook cookbooks.sre.hosts.rename started by bking@cumin2002 from elastic2084 to cirrussearch2084 completed:

  • elastic2084 (PASS)
    • ✔️ Downtimed host on Icinga/Alertmanager
    • ✔️ Disabled puppet and its timer
    • ✔️ Disabled debmonitor-client timer
    • ✔️ Netbox updated
    • ✔️ BMC Hostname updated
    • ✔️ DNS updated
    • ✔️ Switch description updated
    • ✔️ Removed from DebMonitor
    • ✔️ Removed from Puppet master and PuppetDB
    • Rename completed 👍 - now please run the re-image cookbook on the new name with --new

Cookbook cookbooks.sre.hosts.reimage was started by bking@cumin2002 for host cirrussearch2084.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage was started by bking@cumin2002 for host cirrussearch2078.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by bking@cumin2002 for host cirrussearch2078.codfw.wmnet with OS bullseye executed with errors:

  • cirrussearch2078 (FAIL)
    • Host successfully migrated to the new VLAN
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • The reimage failed, see the cookbook logs for the details. You can also try typing "sudo install-console cirrussearch2078.codfw.wmnet" to get a root shell, but depending on the failure this may not work.

Cookbook cookbooks.sre.hosts.reimage was started by bking@cumin2002 for host cirrussearch2078.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by bking@cumin2002 for host cirrussearch2084.codfw.wmnet with OS bullseye completed:

  • cirrussearch2084 (WARN)
    • Host successfully migrated to the new VLAN
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata (7) to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202504251831_bking_940120_cirrussearch2084.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage started by bking@cumin2002 for host cirrussearch2078.codfw.wmnet with OS bullseye executed with errors:

  • cirrussearch2078 (FAIL)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • The reimage failed, see the cookbook logs for the details. You can also try typing "sudo install-console cirrussearch2078.codfw.wmnet" to get a root shell, but depending on the failure this may not work.

Cookbook cookbooks.sre.hosts.reimage was started by bking@cumin2002 for host cirrussearch2078.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by bking@cumin2002 for host cirrussearch2078.codfw.wmnet with OS bullseye executed with errors:

  • cirrussearch2078 (FAIL)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • The reimage failed, see the cookbook logs for the details. You can also try typing "sudo install-console cirrussearch2078.codfw.wmnet" to get a root shell, but depending on the failure this may not work.

Cookbook cookbooks.sre.hosts.rename started by bking@cumin2002 from elastic2086 to cirrussearch2086 completed:

  • elastic2086 (PASS)
    • ✔️ Downtimed host on Icinga/Alertmanager
    • ✔️ Disabled puppet and its timer
    • ✔️ Disabled debmonitor-client timer
    • ✔️ Netbox updated
    • ✔️ BMC Hostname updated
    • ✔️ DNS updated
    • ✔️ Switch description updated
    • ✔️ Removed from DebMonitor
    • ✔️ Removed from Puppet master and PuppetDB
    • Rename completed 👍 - now please run the re-image cookbook on the new name with --new

Cookbook cookbooks.sre.hosts.reimage was started by bking@cumin2002 for host cirrussearch2086.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by bking@cumin2002 for host cirrussearch2086.codfw.wmnet with OS bullseye executed with errors:

  • cirrussearch2086 (FAIL)
    • Failed to migrate host to the new VLAN, sre.hosts.move-vlan cookbook returned 94
    • The reimage failed, see the cookbook logs for the details. You can also try typing "sudo install-console cirrussearch2086.codfw.wmnet" to get a root shell, but depending on the failure this may not work.

Cookbook cookbooks.sre.hosts.reimage was started by bking@cumin2002 for host cirrussearch2086.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by bking@cumin2002 for host cirrussearch2086.codfw.wmnet with OS bullseye completed:

  • cirrussearch2086 (WARN)
    • Host successfully migrated to the new VLAN
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata (7) to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202504260254_bking_1224546_cirrussearch2086.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.rename started by bking@cumin2002 from elastic2092 to cirrussearch2092 completed:

  • elastic2092 (PASS)
    • ✔️ Downtimed host on Icinga/Alertmanager
    • ✔️ Disabled puppet and its timer
    • ✔️ Disabled debmonitor-client timer
    • ✔️ Netbox updated
    • ✔️ BMC Hostname updated
    • ✔️ DNS updated
    • ✔️ Switch description updated
    • ✔️ Removed from DebMonitor
    • ✔️ Removed from Puppet master and PuppetDB
    • Rename completed 👍 - now please run the re-image cookbook on the new name with --new

Cookbook cookbooks.sre.hosts.reimage was started by bking@cumin2002 for host cirrussearch2092.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by bking@cumin2002 for host cirrussearch2092.codfw.wmnet with OS bullseye completed:

  • cirrussearch2092 (PASS)
    • Host successfully migrated to the new VLAN
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata (7) to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202504281422_bking_917112_cirrussearch2092.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.rename started by bking@cumin2002 from elastic2093 to cirrussearch2093 completed:

  • elastic2093 (PASS)
    • ✔️ Downtimed host on Icinga/Alertmanager
    • ✔️ Disabled puppet and its timer
    • ✔️ Disabled debmonitor-client timer
    • ✔️ Netbox updated
    • ✔️ BMC Hostname updated
    • ✔️ DNS updated
    • ✔️ Switch description updated
    • ✔️ Removed from DebMonitor
    • ✔️ Removed from Puppet master and PuppetDB
    • Rename completed 👍 - now please run the re-image cookbook on the new name with --new

Cookbook cookbooks.sre.hosts.reimage was started by bking@cumin2002 for host cirrussearch2093.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by bking@cumin2002 for host cirrussearch2093.codfw.wmnet with OS bullseye executed with errors:

  • cirrussearch2093 (FAIL)
    • Host successfully migrated to the new VLAN
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata (7) to Debian installer
    • Checked BIOS boot parameters are back to normal
    • The reimage failed, see the cookbook logs for the details. You can also try typing "sudo install-console cirrussearch2093.codfw.wmnet" to get a root shell, but depending on the failure this may not work.

Cookbook cookbooks.sre.hosts.rename started by bking@cumin2002 from elastic2100 to cirrussearch2100 completed:

  • elastic2100 (PASS)
    • ✔️ Downtimed host on Icinga/Alertmanager
    • ✔️ Disabled puppet and its timer
    • ✔️ Disabled debmonitor-client timer
    • ✔️ Netbox updated
    • ✔️ BMC Hostname updated
    • ✔️ DNS updated
    • ✔️ Switch description updated
    • ✔️ Removed from DebMonitor
    • ✔️ Removed from Puppet master and PuppetDB
    • Rename completed 👍 - now please run the re-image cookbook on the new name with --new

Cookbook cookbooks.sre.hosts.reimage was started by bking@cumin2002 for host cirrussearch2100.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by bking@cumin2002 for host cirrussearch2100.codfw.wmnet with OS bullseye completed:

  • cirrussearch2100 (WARN)
    • Host successfully migrated to the new VLAN
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata (7) to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202504282032_bking_1179440_cirrussearch2100.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage was started by bking@cumin2002 for host cirrussearch2078.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by bking@cumin2002 for host cirrussearch2078.codfw.wmnet with OS bullseye completed:

  • cirrussearch2078 (PASS)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata (7) to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202504291416_bking_2400239_cirrussearch2078.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.rename started by bking@cumin2002 from elastic2106 to cirrussearch2106 completed:

  • elastic2106 (PASS)
    • ✔️ Downtimed host on Icinga/Alertmanager
    • ✔️ Disabled puppet and its timer
    • ✔️ Disabled debmonitor-client timer
    • ✔️ Netbox updated
    • ✔️ BMC Hostname updated
    • ✔️ DNS updated
    • ✔️ Switch description updated
    • ✔️ Removed from DebMonitor
    • ✔️ Removed from Puppet master and PuppetDB
    • Rename completed 👍 - now please run the re-image cookbook on the new name with --new

Cookbook cookbooks.sre.hosts.reimage was started by bking@cumin2002 for host cirrussearch2106.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.rename started by bking@cumin2002 from elastic2108 to cirrussearch2108 completed:

  • elastic2108 (PASS)
    • ✔️ Downtimed host on Icinga/Alertmanager
    • ✔️ Disabled puppet and its timer
    • ✔️ Disabled debmonitor-client timer
    • ✔️ Netbox updated
    • ✔️ BMC Hostname updated
    • ✔️ DNS updated
    • ✔️ Switch description updated
    • ✔️ Removed from DebMonitor
    • ✔️ Removed from Puppet master and PuppetDB
    • Rename completed 👍 - now please run the re-image cookbook on the new name with --new

Cookbook cookbooks.sre.hosts.reimage was started by bking@cumin2002 for host cirrussearch2108.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by bking@cumin2002 for host cirrussearch2106.codfw.wmnet with OS bullseye completed:

  • cirrussearch2106 (WARN)
    • Host successfully migrated to the new VLAN
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata (7) to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202504291526_bking_2453344_cirrussearch2106.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage started by bking@cumin2002 for host cirrussearch2108.codfw.wmnet with OS bullseye completed:

  • cirrussearch2108 (PASS)
    • Host successfully migrated to the new VLAN
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata (7) to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202504291553_bking_2491396_cirrussearch2108.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB