Page MenuHomePhabricator

Migrate production Elastic clusters to Opensearch
Closed, ResolvedPublic

Description

As we have successfully migrated our smaller clusters from Elastic -> OpenSearch (see T380752 and T387904), the time has come to migrate our production clusters from Elastic to OpenSearch.

Current status

Progress by number of hosts (updated at end of shift):
overall: Complete
codfw: 60/61 hosts migrated - we're calling this complete

  • currently broken:
    • cirrussearch2091 - continually failing PXE and has history of hardware issues (ref T391639)

eqiad: 60/60 hosts migrated - we're calling this complete

Open patches
See the "elastic-2-opensearch" Gerrit topic

Checking progress manually

You can check site.pp in the Puppet repo . See how many hosts are using the old elasticsearch::cirrus role vs. the new cirrus::opensearch role

If you have cumin access, you can also run the following commands: sudo cumin O:elasticsearch::cirrus for the old role, sudo cumin O:cirrus::opensearch for the new role

Creating this ticket to:

  • Create a plan. Documented here
  • Notify stakeholders
  • Migrate each host from Elastic to Opensearch and confirm operation
    • also rename like so: s/elastic/cirrussearch
  • Update docs
  • Roll back any config we temporarily changed for the migration T391350

Details

Other Assignee
RKemper
Related Changes in Gerrit:
SubjectRepoBranchLines +/-
operations/mediawiki-configmaster+3 -3
operations/puppetproduction+1 -1
operations/puppetproduction+17 -35
operations/puppetproduction+0 -6
operations/puppetproduction+1 -1
operations/mediawiki-configmaster+2 -2
operations/puppetproduction+0 -22
operations/mediawiki-configmaster+2 -2
operations/puppetproduction+0 -210
operations/puppetproduction+1 -0
operations/puppetproduction+14 -14
operations/puppetproduction+6 -6
operations/puppetproduction+8 -2
operations/puppetproduction+24 -34
operations/puppetproduction+22 -20
operations/puppetproduction+17 -19
operations/puppetproduction+2 -0
operations/puppetproduction+20 -14
operations/puppetproduction+396 -402
operations/puppetproduction+7 -1
operations/puppetproduction+0 -1
operations/puppetproduction+23 -4
operations/puppetproduction+28 -4
operations/puppetproduction+15 -10
operations/mediawiki-configmaster+3 -3
operations/puppetproduction+87 -0
operations/puppetproduction+1 -0
operations/cookbooksmaster+14 -3
operations/puppetproduction+1 -1
operations/puppetproduction+1 -89
operations/puppetproduction+11 -0
operations/puppetproduction+273 -1
operations/puppetproduction+22 -2
operations/puppetproduction+0 -12
operations/puppetproduction+0 -1
operations/puppetproduction+0 -3
operations/puppetproduction+0 -1
operations/puppetproduction+0 -1
operations/puppetproduction+2 -0
operations/puppetproduction+651 -651
operations/puppetproduction+1 -1
operations/puppetproduction+16 -6
operations/puppetproduction+6 -5
operations/puppetproduction+4 -5
operations/puppetproduction+4 -4
operations/puppetproduction+12 -10
operations/puppetproduction+28 -2
operations/puppetproduction+1 -1
operations/puppetproduction+8 -8
operations/puppetproduction+4 -0
operations/puppetproduction+8 -7
operations/puppetproduction+3 -6
operations/puppetproduction+0 -11
operations/puppetproduction+1 -0
operations/puppetproduction+1 -1
operations/puppetproduction+1 -1
operations/puppetproduction+1 -1
operations/puppetproduction+1 -1
operations/puppetproduction+6 -1
operations/puppetproduction+5 -4
operations/puppetproduction+13 -1
operations/puppetproduction+1 -5
operations/puppetproduction+40 -40
operations/puppetproduction+5 -1
operations/puppetproduction+1 -1
operations/puppetproduction+0 -11
operations/puppetproduction+5 -7
operations/puppetproduction+43 -43
operations/cookbooksmaster+11 -2
operations/puppetproduction+1 -0
operations/puppetproduction+19 -5
operations/puppetproduction+1 -1
operations/puppetproduction+94 -0
operations/puppetproduction+2 -0
operations/puppetproduction+16 -18
operations/puppetproduction+13 -16
operations/puppetproduction+3 -3
operations/puppetproduction+6 -6
operations/puppetproduction+4 -6
operations/puppetproduction+1 -1
operations/puppetproduction+2 -2
operations/puppetproduction+12 -11
operations/puppetproduction+2 -2
operations/puppetproduction+2 -2
operations/puppetproduction+0 -0
operations/puppetproduction+1 -1
operations/puppetproduction+1 -0
operations/puppetproduction+1 -1
operations/puppetproduction+0 -0
operations/puppetproduction+0 -1
operations/puppetproduction+6 -9
operations/puppetproduction+101 -6
operations/puppetproduction+44 -0
operations/mediawiki-configmaster+2 -2
operations/puppetproduction+5 -0
operations/puppetproduction+47 -5
operations/mediawiki-configmaster+1 -1
operations/puppetproduction+10 -0
operations/puppetproduction+462 -0
Show related patches Customize query in gerrit

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Cookbook cookbooks.sre.hosts.reimage was started by ryankemper@cumin2002 for host cirrussearch1089.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage was started by ryankemper@cumin2002 for host cirrussearch1090.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by ryankemper@cumin2002 for host cirrussearch1089.eqiad.wmnet with OS bullseye completed:

  • cirrussearch1089 (WARN)
    • Host successfully migrated to the new VLAN
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata (7) to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202505230420_ryankemper_271156_cirrussearch1089.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage started by ryankemper@cumin2002 for host cirrussearch1090.eqiad.wmnet with OS bullseye completed:

  • cirrussearch1090 (WARN)
    • Host successfully migrated to the new VLAN
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata (7) to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202505230439_ryankemper_280574_cirrussearch1090.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage was started by ryankemper@cumin2002 for host cirrussearch1092.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage was started by ryankemper@cumin2002 for host cirrussearch1091.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by ryankemper@cumin2002 for host cirrussearch1092.eqiad.wmnet with OS bullseye completed:

  • cirrussearch1092 (PASS)
    • Host successfully migrated to the new VLAN
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata (7) to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202505230559_ryankemper_319581_cirrussearch1092.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage started by ryankemper@cumin2002 for host cirrussearch1091.eqiad.wmnet with OS bullseye completed:

  • cirrussearch1091 (WARN)
    • Host successfully migrated to the new VLAN
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata (7) to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202505230555_ryankemper_319840_cirrussearch1091.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage was started by ryankemper@cumin2002 for host cirrussearch1093.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage was started by ryankemper@cumin2002 for host cirrussearch1094.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by ryankemper@cumin2002 for host cirrussearch1093.eqiad.wmnet with OS bullseye completed:

  • cirrussearch1093 (WARN)
    • Host successfully migrated to the new VLAN
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata (7) to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202505230654_ryankemper_349726_cirrussearch1093.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage started by ryankemper@cumin2002 for host cirrussearch1094.eqiad.wmnet with OS bullseye completed:

  • cirrussearch1094 (WARN)
    • Host successfully migrated to the new VLAN
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata (7) to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202505230702_ryankemper_349761_cirrussearch1094.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage was started by ryankemper@cumin2002 for host cirrussearch1108.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage was started by ryankemper@cumin2002 for host cirrussearch1095.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage was started by ryankemper@cumin2002 for host cirrussearch1109.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by ryankemper@cumin2002 for host cirrussearch1095.eqiad.wmnet with OS bullseye completed:

  • cirrussearch1095 (WARN)
    • Host successfully migrated to the new VLAN
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata (7) to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202505230815_ryankemper_389336_cirrussearch1095.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage started by ryankemper@cumin2002 for host cirrussearch1108.eqiad.wmnet with OS bullseye completed:

  • cirrussearch1108 (WARN)
    • Host successfully migrated to the new VLAN
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata (7) to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202505230819_ryankemper_389426_cirrussearch1108.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage started by ryankemper@cumin2002 for host cirrussearch1109.eqiad.wmnet with OS bullseye completed:

  • cirrussearch1109 (WARN)
    • Host successfully migrated to the new VLAN
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata (7) to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202505230857_ryankemper_406013_cirrussearch1109.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB

Change #1149687 had a related patch set uploaded (by Bking; author: Bking):

[operations/puppet@production] cirrussearch: add cirrussearch row E/remove elastic row F

https://gerrit.wikimedia.org/r/1149687

Change #1149687 merged by Bking:

[operations/puppet@production] cirrussearch: add cirrussearch row E/remove elastic row F

https://gerrit.wikimedia.org/r/1149687

Cookbook cookbooks.sre.hosts.reimage was started by bking@cumin2002 for host cirrussearch1096.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage was started by bking@cumin2002 for host cirrussearch1097.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by bking@cumin2002 for host cirrussearch1096.eqiad.wmnet with OS bullseye completed:

  • cirrussearch1096 (WARN)
    • Host successfully migrated to the new VLAN
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata (7) to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202505231542_bking_589941_cirrussearch1096.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage started by bking@cumin2002 for host cirrussearch1097.eqiad.wmnet with OS bullseye completed:

  • cirrussearch1097 (WARN)
    • Host successfully migrated to the new VLAN
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata (7) to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202505231556_bking_598011_cirrussearch1097.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage was started by bking@cumin2002 for host cirrussearch1098.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage was started by bking@cumin2002 for host cirrussearch1099.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by bking@cumin2002 for host cirrussearch1099.eqiad.wmnet with OS bullseye completed:

  • cirrussearch1099 (WARN)
    • Host successfully migrated to the new VLAN
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata (7) to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202505231802_bking_655433_cirrussearch1099.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage started by bking@cumin2002 for host cirrussearch1098.eqiad.wmnet with OS bullseye completed:

  • cirrussearch1098 (WARN)
    • Host successfully migrated to the new VLAN
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata (7) to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202505231753_bking_652202_cirrussearch1098.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage was started by bking@cumin2002 for host cirrussearch1100.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage was started by bking@cumin2002 for host cirrussearch1101.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by bking@cumin2002 for host cirrussearch1100.eqiad.wmnet with OS bullseye completed:

  • cirrussearch1100 (WARN)
    • Host successfully migrated to the new VLAN
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata (7) to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202505231927_bking_699335_cirrussearch1100.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage was started by bking@cumin2002 for host cirrussearch1102.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by bking@cumin2002 for host cirrussearch1101.eqiad.wmnet with OS bullseye completed:

  • cirrussearch1101 (WARN)
    • Host successfully migrated to the new VLAN
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata (7) to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202505231945_bking_704899_cirrussearch1101.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage was started by bking@cumin2002 for host cirrussearch1107.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by bking@cumin2002 for host cirrussearch1107.eqiad.wmnet with OS bullseye executed with errors:

  • cirrussearch1107 (FAIL)
    • Host successfully migrated to the new VLAN
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata (7) to Debian installer
    • Checked BIOS boot parameters are back to normal
    • The reimage failed, see the cookbook logs for the details. You can also try typing "sudo install-console cirrussearch1107.eqiad.wmnet" to get a root shell, but depending on the failure this may not work.

Cookbook cookbooks.sre.hosts.reimage was started by bking@cumin2002 for host cirrussearch1107.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by bking@cumin2002 for host cirrussearch1102.eqiad.wmnet with OS bullseye completed:

  • cirrussearch1102 (WARN)
    • Host successfully migrated to the new VLAN
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata (7) to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202505232018_bking_725184_cirrussearch1102.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage started by bking@cumin2002 for host cirrussearch1107.eqiad.wmnet with OS bullseye completed:

  • cirrussearch1107 (WARN)
    • Host successfully migrated to the new VLAN
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata (7) to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202505232103_bking_744874_cirrussearch1107.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage was started by bking@cumin2002 for host cirrussearch1110.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by bking@cumin2002 for host cirrussearch1110.eqiad.wmnet with OS bullseye completed:

  • cirrussearch1110 (WARN)
    • Host successfully migrated to the new VLAN
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata (7) to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202505271512_bking_3150298_cirrussearch1110.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage was started by bking@cumin2002 for host cirrussearch1103.eqiad.wmnet with OS bullseye

Change #1151256 had a related patch set uploaded (by Bking; author: Bking):

[operations/puppet@production] elastic/cirrussearch: re-enable monitoring for eqiad

https://gerrit.wikimedia.org/r/1151256

Cookbook cookbooks.sre.hosts.reimage started by bking@cumin2002 for host cirrussearch1103.eqiad.wmnet with OS bullseye completed:

  • cirrussearch1103 (PASS)
    • Host successfully migrated to the new VLAN
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata (7) to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202505271616_bking_3181218_cirrussearch1103.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Change #1151294 had a related patch set uploaded (by Bking; author: Bking):

[operations/puppet@production] cirrussearch: add row F, remove soon-to-be-decom hosts

https://gerrit.wikimedia.org/r/1151294

Change #1131715 abandoned by Bking:

[operations/puppet@production] elastic: Change first batch of prod elastic hosts to OpenSearch

Reason:

We've already migrated to OpenSearch

https://gerrit.wikimedia.org/r/1131715

Change #1151294 merged by Bking:

[operations/puppet@production] cirrussearch: add row F, remove soon-to-be-decom hosts

https://gerrit.wikimedia.org/r/1151294

Change #1152830 had a related patch set uploaded (by Ryan Kemper; author: Ryan Kemper):

[operations/puppet@production] cirrus: add missing entry for cirrussearch2061

https://gerrit.wikimedia.org/r/1152830

Change #1152830 merged by Ryan Kemper:

[operations/puppet@production] cirrus: add missing entry for cirrussearch2061

https://gerrit.wikimedia.org/r/1152830

Change #1155288 had a related patch set uploaded (by Bking; author: Bking):

[operations/puppet@production] cirrussearch: remove references to defunct elastic hosts, part 1

https://gerrit.wikimedia.org/r/1155288

Change #1155288 merged by Bking:

[operations/puppet@production] cirrussearch: remove references to defunct elastic hosts, part 1

https://gerrit.wikimedia.org/r/1155288

Change #1154828 had a related patch set uploaded (by Bking; author: Ebernhardson):

[operations/mediawiki-config@master] search: Return traffic to all DCs

https://gerrit.wikimedia.org/r/1154828

Change #1154828 merged by Bking:

[operations/mediawiki-config@master] search: Return traffic to all DCs

https://gerrit.wikimedia.org/r/1154828

Change #1155738 had a related patch set uploaded (by Bking; author: Bking):

[operations/mediawiki-config@master] cirrussearch: return traffic to all DCs

https://gerrit.wikimedia.org/r/1155738

Change #1159507 had a related patch set uploaded (by Bking; author: Bking):

[operations/puppet@production] cirrussearch: remove non-existent hosts

https://gerrit.wikimedia.org/r/1159507

Change #1159507 merged by Bking:

[operations/puppet@production] cirrussearch: remove non-existent hosts

https://gerrit.wikimedia.org/r/1159507

Change #1155738 merged by jenkins-bot:

[operations/mediawiki-config@master] cirrussearch: return traffic to all DCs

https://gerrit.wikimedia.org/r/1155738

Mentioned in SAL (#wikimedia-operations) [2025-06-17T20:04:58Z] <ebernhardson@deploy1003> Started scap sync-world: Backport for [[gerrit:1155738|cirrussearch: return traffic to all DCs (T388610)]]

Mentioned in SAL (#wikimedia-operations) [2025-06-17T20:07:14Z] <ebernhardson@deploy1003> bking, ebernhardson: Backport for [[gerrit:1155738|cirrussearch: return traffic to all DCs (T388610)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.

Change #1162029 had a related patch set uploaded (by Bking; author: Bking):

[operations/puppet@production] cirrussearch: set host to correct lb pool

https://gerrit.wikimedia.org/r/1162029

Change #1162029 merged by Bking:

[operations/puppet@production] cirrussearch: set host to correct lb pool

https://gerrit.wikimedia.org/r/1162029

I'm happy to report that the Elastic->OpenSearch migration is complete. Related work continues in T391350 , but I think we are ready to close this task. Thanks to everyone who helped out with this effort!

Change #1151256 merged by Bking:

[operations/puppet@production] elastic/cirrussearch: re-enable monitoring for eqiad

https://gerrit.wikimedia.org/r/1151256

Change #1146041 abandoned by Bking:

[operations/puppet@production] elastic/cirrussearch: prepare hosts for decommission

Reason:

We've already addressed this, see T394350

https://gerrit.wikimedia.org/r/1146041

Change #1144639 abandoned by Bking:

[operations/puppet@production] elastic: don't filter out self cluster settings

Reason:

already addressed by Iaf4ff43fc982f52b9eef34dc6cdb66b3e3e74a07

https://gerrit.wikimedia.org/r/1144639

Change #1129183 abandoned by DCausse:

[operations/mediawiki-config@master] cirrus: switch search traffic back to multi-DC

Reason:

no longer needed

https://gerrit.wikimedia.org/r/1129183