Page MenuHomePhabricator

Reimage external store hosts with Bookworm
Closed, ResolvedPublic

Description

They were all migrated to 10.6, but still running Bullseye, they need to be migrated to bookworm

es4

  • es2020 master
  • es2021
  • es2022
  • es1020
  • es1021 master
  • es1022

es5

  • es2023 master
  • es2024
  • es2025
  • es1023
  • es1024 master
  • es1025

Event Timeline

Marostegui triaged this task as Medium priority.Mon, May 6, 6:34 AM
Marostegui moved this task from Triage to In progress on the DBA board.

Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1002 for host es1020.eqiad.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1002 for host es1020.eqiad.wmnet with OS bookworm completed:

  • es1020 (WARN)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202405060732_marostegui_577385_es1020.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB

Change #1028365 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] es1020: Remove package declaration

https://gerrit.wikimedia.org/r/1028365

Change #1028365 merged by Marostegui:

[operations/puppet@production] es1020: Remove package declaration

https://gerrit.wikimedia.org/r/1028365

Mentioned in SAL (#wikimedia-operations) [2024-05-06T08:04:24Z] <marostegui@cumin1002> dbctl commit (dc=all): 'Depool es1025 T364289', diff saved to https://phabricator.wikimedia.org/P61890 and previous config saved to /var/cache/conftool/dbconfig/20240506-080423-root.json

Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1002 for host es1025.eqiad.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1002 for host es1025.eqiad.wmnet with OS bookworm completed:

  • es1025 (WARN)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202405060822_marostegui_585872_es1025.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1002 for host es2024.codfw.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1002 for host es2024.codfw.wmnet with OS bookworm completed:

  • es2024 (WARN)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202405061044_marostegui_641087_es2024.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB
    • Cleared switch DHCP cache and MAC table for the host IP and MAC (EVPN Switch)

Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1002 for host es2021.codfw.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1002 for host es2021.codfw.wmnet with OS bookworm completed:

  • es2021 (WARN)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202405061350_marostegui_830494_es2021.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB
    • Cleared switch DHCP cache and MAC table for the host IP and MAC (EVPN Switch)

Change #1028513 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] es2021: Remove package declaration

https://gerrit.wikimedia.org/r/1028513

Change #1028513 merged by Marostegui:

[operations/puppet@production] es2021: Remove package declaration

https://gerrit.wikimedia.org/r/1028513

Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1002 for host es2025.codfw.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1002 for host es2022.codfw.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1002 for host es2025.codfw.wmnet with OS bookworm completed:

  • es2025 (WARN)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202405080528_marostegui_1246868_es2025.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB
    • Cleared switch DHCP cache and MAC table for the host IP and MAC (EVPN Switch)

Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1002 for host es2022.codfw.wmnet with OS bookworm completed:

  • es2022 (WARN)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202405080611_marostegui_1254735_es2022.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1002 for host es2023.codfw.wmnet with OS bookworm

Change #1029120 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] es2022: Remove package declaration

https://gerrit.wikimedia.org/r/1029120

Change #1029120 merged by Marostegui:

[operations/puppet@production] es2022: Remove package declaration

https://gerrit.wikimedia.org/r/1029120

Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1002 for host es2023.codfw.wmnet with OS bookworm completed:

  • es2023 (WARN)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202405080835_marostegui_1274691_es2023.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB

Change #1029133 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] es1022: Disable notifications

https://gerrit.wikimedia.org/r/1029133

Mentioned in SAL (#wikimedia-operations) [2024-05-08T09:06:22Z] <marostegui@cumin1002> dbctl commit (dc=all): 'Depool es1022 T364289', diff saved to https://phabricator.wikimedia.org/P62038 and previous config saved to /var/cache/conftool/dbconfig/20240508-090621-root.json

Change #1029133 merged by Marostegui:

[operations/puppet@production] es1022: Disable notifications

https://gerrit.wikimedia.org/r/1029133

Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1002 for host es1022.eqiad.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1002 for host es1022.eqiad.wmnet with OS bookworm completed:

  • es1022 (WARN)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202405080925_marostegui_1284070_es1022.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1002 for host es2020.codfw.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1002 for host es2020.codfw.wmnet with OS bookworm completed:

  • es2020 (WARN)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202405090650_marostegui_1672825_es2020.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB
    • Cleared switch DHCP cache and MAC table for the host IP and MAC (EVPN Switch)

Mentioned in SAL (#wikimedia-operations) [2024-05-16T07:48:37Z] <marostegui@cumin1002> dbctl commit (dc=all): 'Depool es1021 T364289', diff saved to https://phabricator.wikimedia.org/P62454 and previous config saved to /var/cache/conftool/dbconfig/20240516-074837-root.json

Change #1032385 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] es1021: Disable notifications

https://gerrit.wikimedia.org/r/1032385

Change #1032385 merged by Marostegui:

[operations/puppet@production] es1021: Disable notifications

https://gerrit.wikimedia.org/r/1032385

Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1002 for host es1021.eqiad.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1002 for host es1021.eqiad.wmnet with OS bookworm completed:

  • es1021 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202405160808_marostegui_361726_es1021.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Change #1032403 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] es1021: Remove package declaration

https://gerrit.wikimedia.org/r/1032403

Change #1032403 merged by Marostegui:

[operations/puppet@production] es1021: Remove package declaration

https://gerrit.wikimedia.org/r/1032403

Change #1032479 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] es1024: Disable notifications

https://gerrit.wikimedia.org/r/1032479

Mentioned in SAL (#wikimedia-operations) [2024-05-16T13:11:12Z] <marostegui@cumin1002> dbctl commit (dc=all): 'Depool es1024 T364289', diff saved to https://phabricator.wikimedia.org/P62501 and previous config saved to /var/cache/conftool/dbconfig/20240516-131111-root.json

Change #1032479 merged by Marostegui:

[operations/puppet@production] es1024: Disable notifications

https://gerrit.wikimedia.org/r/1032479

Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1002 for host es1024.eqiad.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1002 for host es1024.eqiad.wmnet with OS bookworm completed:

  • es1024 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202405161331_marostegui_407297_es1024.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Change #1032620 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] es1024: Remove package declaration

https://gerrit.wikimedia.org/r/1032620

Change #1032620 merged by Marostegui:

[operations/puppet@production] es1024: Remove package declaration

https://gerrit.wikimedia.org/r/1032620

Change #1032622 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] site.pp: Reorganize es5 hosts

https://gerrit.wikimedia.org/r/1032622

Change #1032622 merged by Marostegui:

[operations/puppet@production] site.pp: Reorganize es5 hosts

https://gerrit.wikimedia.org/r/1032622