Page MenuHomePhabricator

Migrate es2 section to Debian Trixie
Closed, ResolvedPublic

Description

  • es2056
  • es2054
  • es2049
  • es1056
  • es1053
  • es1049

Details

Related Changes in Gerrit:

Event Timeline

Marostegui triaged this task as Medium priority.
Marostegui moved this task from Triage to Ready on the DBA board.

Completed depooling of es2056 by marostegui@cumin1003: Upgrading es2056.codfw.wmnet

Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1003 for host es2056.codfw.wmnet with OS trixie

Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1003 for host es2056.codfw.wmnet with OS trixie completed:

  • es2056 (WARN)
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh trixie OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202606021028_marostegui_4060565_es2056.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Skipping waiting for Icinga optimal status and not removing the downtime, --no-check-icinga was set
    • Updated Netbox data from PuppetDB

Completed depooling of es2049 by marostegui@cumin1003: Upgrading es2049.codfw.wmnet

Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1003 for host es2049.codfw.wmnet with OS trixie

Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1003 for host es2049.codfw.wmnet with OS trixie completed:

  • es2049 (WARN)
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh trixie OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202606021211_marostegui_4137631_es2049.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Skipping waiting for Icinga optimal status and not removing the downtime, --no-check-icinga was set
    • Updated Netbox data from PuppetDB

Completed depooling of es1056 by marostegui@cumin1003: Upgrading es1056.eqiad.wmnet

Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1003 for host es1056.eqiad.wmnet with OS trixie

Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1003 for host es1056.eqiad.wmnet with OS trixie completed:

  • es1056 (WARN)
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh trixie OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202606030539_marostegui_509014_es1056.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Skipping waiting for Icinga optimal status and not removing the downtime, --no-check-icinga was set
    • Updated Netbox data from PuppetDB

Mentioned in SAL (#wikimedia-operations) [2026-06-03T06:46:24Z] <marostegui@cumin1003> dbctl commit (dc=all): 'Promote es2056 to es2 codfw primary T427875', diff saved to https://phabricator.wikimedia.org/P93632 and previous config saved to /var/cache/conftool/dbconfig/20260603-064623-marostegui.json

es2054 removed from master, es2056 new master.

Completed depooling of es1049 by marostegui@cumin1003: Upgrading es1049.eqiad.wmnet

Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1003 for host es1049.eqiad.wmnet with OS trixie

Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1003 for host es1049.eqiad.wmnet with OS trixie completed:

  • es1049 (WARN)
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh trixie OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202606030714_marostegui_520450_es1049.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Skipping waiting for Icinga optimal status and not removing the downtime, --no-check-icinga was set
    • Updated Netbox data from PuppetDB

Change #1296952 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/dns@master] wmnet: Update es2-master alias

https://gerrit.wikimedia.org/r/1296952

Mentioned in SAL (#wikimedia-operations) [2026-06-03T07:42:50Z] <marostegui@cumin1003> dbctl commit (dc=all): 'Promote es1056 to es2 eqiad primary T427875', diff saved to https://phabricator.wikimedia.org/P93637 and previous config saved to /var/cache/conftool/dbconfig/20260603-074250-marostegui.json

Change #1296952 merged by Marostegui:

[operations/dns@master] wmnet: Update es2-master alias

https://gerrit.wikimedia.org/r/1296952

Completed depooling of es2054 by marostegui@cumin1003: Upgrading es2054.codfw.wmnet

Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1003 for host es2054.codfw.wmnet with OS trixie

Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1003 for host es2054.codfw.wmnet with OS trixie completed:

  • es2054 (WARN)
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh trixie OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202606030814_marostegui_531258_es2054.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Skipping waiting for Icinga optimal status and not removing the downtime, --no-check-icinga was set
    • Updated Netbox data from PuppetDB

Completed depooling of es1053 by marostegui@cumin1003: Upgrading es1053.eqiad.wmnet

Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1003 for host es1053.eqiad.wmnet with OS trixie

Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1003 for host es1053.eqiad.wmnet with OS trixie executed with errors:

  • es1053 (FAIL)
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh trixie OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Unable to downtime the new host on Icinga/Alertmanager, the sre.hosts.downtime cookbook returned 99
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202606030941_marostegui_606482_es1053.out
    • The reimage failed, see the cookbook logs for the details. You can also try typing "sudo install-console es1053.eqiad.wmnet" to get a root shell, but depending on the failure this may not work.

This has been done