- es2056
- es2054
- es2049
- es1056
- es1053
- es1049
Description
Details
| Subject | Repo | Branch | Lines +/- | |
|---|---|---|---|---|
| wmnet: Update es2-master alias | operations/dns | master | +1 -1 |
| Status | Subtype | Assigned | Task | ||
|---|---|---|---|---|---|
| Open | Marostegui | T422365 Migration to Debian Trixie of production database-related hosts | |||
| Resolved | Marostegui | T427875 Migrate es2 section to Debian Trixie |
Event Timeline
Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1003 for host es2056.codfw.wmnet with OS trixie
Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1003 for host es2056.codfw.wmnet with OS trixie completed:
- es2056 (WARN)
- Disabled Puppet
- Removed from Puppet and PuppetDB if present and deleted any certificates
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- Host up (Debian installer)
- Checked BIOS boot parameters are back to normal
- Host up (new fresh trixie OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga/Alertmanager
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202606021028_marostegui_4060565_es2056.out
- configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Skipping waiting for Icinga optimal status and not removing the downtime, --no-check-icinga was set
- Updated Netbox data from PuppetDB
Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1003 for host es2049.codfw.wmnet with OS trixie
Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1003 for host es2049.codfw.wmnet with OS trixie completed:
- es2049 (WARN)
- Disabled Puppet
- Removed from Puppet and PuppetDB if present and deleted any certificates
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- Host up (Debian installer)
- Checked BIOS boot parameters are back to normal
- Host up (new fresh trixie OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga/Alertmanager
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202606021211_marostegui_4137631_es2049.out
- configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Skipping waiting for Icinga optimal status and not removing the downtime, --no-check-icinga was set
- Updated Netbox data from PuppetDB
Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1003 for host es1056.eqiad.wmnet with OS trixie
Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1003 for host es1056.eqiad.wmnet with OS trixie completed:
- es1056 (WARN)
- Disabled Puppet
- Removed from Puppet and PuppetDB if present and deleted any certificates
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- Host up (Debian installer)
- Checked BIOS boot parameters are back to normal
- Host up (new fresh trixie OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga/Alertmanager
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202606030539_marostegui_509014_es1056.out
- configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Skipping waiting for Icinga optimal status and not removing the downtime, --no-check-icinga was set
- Updated Netbox data from PuppetDB
Mentioned in SAL (#wikimedia-operations) [2026-06-03T06:46:24Z] <marostegui@cumin1003> dbctl commit (dc=all): 'Promote es2056 to es2 codfw primary T427875', diff saved to https://phabricator.wikimedia.org/P93632 and previous config saved to /var/cache/conftool/dbconfig/20260603-064623-marostegui.json
Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1003 for host es1049.eqiad.wmnet with OS trixie
Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1003 for host es1049.eqiad.wmnet with OS trixie completed:
- es1049 (WARN)
- Disabled Puppet
- Removed from Puppet and PuppetDB if present and deleted any certificates
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- Host up (Debian installer)
- Checked BIOS boot parameters are back to normal
- Host up (new fresh trixie OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga/Alertmanager
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202606030714_marostegui_520450_es1049.out
- configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Skipping waiting for Icinga optimal status and not removing the downtime, --no-check-icinga was set
- Updated Netbox data from PuppetDB
Change #1296952 had a related patch set uploaded (by Marostegui; author: Marostegui):
[operations/dns@master] wmnet: Update es2-master alias
Mentioned in SAL (#wikimedia-operations) [2026-06-03T07:42:50Z] <marostegui@cumin1003> dbctl commit (dc=all): 'Promote es1056 to es2 eqiad primary T427875', diff saved to https://phabricator.wikimedia.org/P93637 and previous config saved to /var/cache/conftool/dbconfig/20260603-074250-marostegui.json
Change #1296952 merged by Marostegui:
[operations/dns@master] wmnet: Update es2-master alias
Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1003 for host es2054.codfw.wmnet with OS trixie
Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1003 for host es2054.codfw.wmnet with OS trixie completed:
- es2054 (WARN)
- Disabled Puppet
- Removed from Puppet and PuppetDB if present and deleted any certificates
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- Host up (Debian installer)
- Checked BIOS boot parameters are back to normal
- Host up (new fresh trixie OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga/Alertmanager
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202606030814_marostegui_531258_es2054.out
- configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Skipping waiting for Icinga optimal status and not removing the downtime, --no-check-icinga was set
- Updated Netbox data from PuppetDB
Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1003 for host es1053.eqiad.wmnet with OS trixie
Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1003 for host es1053.eqiad.wmnet with OS trixie executed with errors:
- es1053 (FAIL)
- Disabled Puppet
- Removed from Puppet and PuppetDB if present and deleted any certificates
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- Host up (Debian installer)
- Checked BIOS boot parameters are back to normal
- Host up (new fresh trixie OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Unable to downtime the new host on Icinga/Alertmanager, the sre.hosts.downtime cookbook returned 99
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202606030941_marostegui_606482_es1053.out
- The reimage failed, see the cookbook logs for the details. You can also try typing "sudo install-console es1053.eqiad.wmnet" to get a root shell, but depending on the failure this may not work.