Page MenuHomePhabricator

Migrate es3 section to Debian Trixie
Closed, ResolvedPublic

Description

  • es2057
  • es2052
  • es2050
  • es1057
  • es1054
  • es1051

Details

Related Changes in Gerrit:

Event Timeline

Completed depooling of es2057 by marostegui@cumin1003: Upgrading es2057.codfw.wmnet

Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1003 for host es2057.codfw.wmnet with OS trixie

Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1003 for host es2057.codfw.wmnet with OS trixie completed:

  • es2057 (WARN)
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh trixie OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202606040954_marostegui_1078552_es2057.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Skipping waiting for Icinga optimal status and not removing the downtime, --no-check-icinga was set
    • Updated Netbox data from PuppetDB

Completed depooling of es2050 by marostegui@cumin1003: Upgrading es2050.codfw.wmnet

Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1003 for host es2050.codfw.wmnet with OS trixie

Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1003 for host es2050.codfw.wmnet with OS trixie completed:

  • es2050 (WARN)
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh trixie OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202606041123_marostegui_1091846_es2050.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Skipping waiting for Icinga optimal status and not removing the downtime, --no-check-icinga was set
    • Updated Netbox data from PuppetDB

Completed depooling of es1057 by marostegui@cumin1003: Upgrading es1057.eqiad.wmnet

Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1003 for host es1057.eqiad.wmnet with OS trixie

Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1003 for host es1057.eqiad.wmnet with OS trixie completed:

  • es1057 (WARN)
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh trixie OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202606041325_marostegui_1118960_es1057.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Skipping waiting for Icinga optimal status and not removing the downtime, --no-check-icinga was set
    • Updated Netbox data from PuppetDB

Mentioned in SAL (#wikimedia-operations) [2026-06-04T13:56:32Z] <marostegui@cumin1003> dbctl commit (dc=all): 'Promote es2050 to es3 codfw primary T428050', diff saved to https://phabricator.wikimedia.org/P93878 and previous config saved to /var/cache/conftool/dbconfig/20260604-135631-marostegui.json

Completed depooling of es1054 by marostegui@cumin1003: Upgrading es1054.eqiad.wmnet

Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1003 for host es1054.eqiad.wmnet with OS trixie

Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1003 for host es1054.eqiad.wmnet with OS trixie completed:

  • es1054 (WARN)
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh trixie OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202606050545_marostegui_1236596_es1054.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Skipping waiting for Icinga optimal status and not removing the downtime, --no-check-icinga was set
    • Updated Netbox data from PuppetDB

Completed depooling of es2052 by marostegui@cumin1003: Upgrading es2052.codfw.wmnet

Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1003 for host es2052.codfw.wmnet with OS trixie

Mentioned in SAL (#wikimedia-operations) [2026-06-08T05:31:56Z] <marostegui@cumin1003> dbctl commit (dc=all): 'Promote es1054 to es3 eqiad primary T428050', diff saved to https://phabricator.wikimedia.org/P93895 and previous config saved to /var/cache/conftool/dbconfig/20260608-053156-marostegui.json

Change #1298421 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/dns@master] wmnet: Update es3-master CNAME

https://gerrit.wikimedia.org/r/1298421

Change #1298421 merged by Marostegui:

[operations/dns@master] wmnet: Update es3-master CNAME

https://gerrit.wikimedia.org/r/1298421

Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1003 for host es2052.codfw.wmnet with OS trixie completed:

  • es2052 (WARN)
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh trixie OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202606080539_marostegui_1786432_es2052.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Skipping waiting for Icinga optimal status and not removing the downtime, --no-check-icinga was set
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1003 for host es1051.eqiad.wmnet with OS trixie

Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1003 for host es1051.eqiad.wmnet with OS trixie completed:

  • es1051 (WARN)
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh trixie OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202606080620_marostegui_1789103_es1051.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Skipping waiting for Icinga optimal status and not removing the downtime, --no-check-icinga was set
    • Updated Netbox data from PuppetDB
Marostegui updated the task description. (Show Details)

All done