Probably let's start with test-s4 cluster host to start testing the initial run.
Initially with MariaDB 10.11
Description
Details
Event Timeline
Change #1196628 had a related patch set uploaded (by Marostegui; author: Marostegui):
[operations/puppet@production] mariadb: Define mariadb packages for trixie
Change #1196628 merged by Marostegui:
[operations/puppet@production] mariadb: Define mariadb packages for trixie
Change #1196893 had a related patch set uploaded (by Marostegui; author: Marostegui):
[operations/puppet@production] packages_wmf,packages_client.pp: Add trixie
Change #1196893 merged by Marostegui:
[operations/puppet@production] packages_wmf,packages_client.pp: Add trixie
Mentioned in SAL (#wikimedia-operations) [2025-10-20T07:28:33Z] <marostegui> Stop MariaDB on es2032 to clone sretest2003 T407472
Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1003 for host db-test1003.eqiad.wmnet with OS trixie
db-test1003 installed with trixie and mariadb 10.11
I will install a vanilla mariadb database just to do a few tests there with puppet, configuration etc.
Change #1197629 had a related patch set uploaded (by Federico Ceratto; author: Federico Ceratto):
[operations/puppet@production] aptrepo: enable wmfmariadbpy for Trixie
Change #1197629 merged by Federico Ceratto:
[operations/puppet@production] aptrepo: enable wmfmariadbpy for Trixie
Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1003 for host db-test1003.eqiad.wmnet with OS trixie executed with errors:
- db-test1003 (FAIL)
- Downtimed on Icinga/Alertmanager
- Disabled Puppet
- Removed from Puppet and PuppetDB if present and deleted any certificates
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via gnt-instance
- Host up (Debian installer)
- Add puppet_version metadata (7) to Debian installer
- Set boot media to disk
- Host up (new fresh trixie OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga/Alertmanager
- Removed previous downtime on Alertmanager (old OS)
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202510210923_marostegui_1835076_db-test1003.out
- configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
- Rebooted
- The reimage failed, see the cookbook logs for the details. You can also try typing "sudo install-console db-test1003.eqiad.wmnet" to get a root shell, but depending on the failure this may not work.
This was due to T407845 but the reimage in general went fine, it is just that puppet issue
Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1003 for host es2028.codfw.wmnet with OS trixie
Change #1200049 had a related patch set uploaded (by Marostegui; author: Marostegui):
[operations/puppet@production] installserver: Format /srv/ in es2028
Change #1200049 merged by Marostegui:
[operations/puppet@production] installserver: Format /srv/ in es2028
Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1003 for host es2028.codfw.wmnet with OS trixie executed with errors:
- es2028 (FAIL)
- Downtimed on Icinga/Alertmanager
- Disabled Puppet
- Removed from Puppet and PuppetDB if present and deleted any certificates
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- Host up (Debian installer)
- Add puppet_version metadata (7) to Debian installer
- Checked BIOS boot parameters are back to normal
- Host up (new fresh trixie OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga/Alertmanager
- Removed previous downtime on Alertmanager (old OS)
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202510301203_marostegui_1349427_es2028.out
- configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
- Rebooted
- The reimage failed, see the cookbook logs for the details. You can also try typing "sudo install-console es2028.codfw.wmnet" to get a root shell, but depending on the failure this may not work.
Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1003 for host es2028.codfw.wmnet with OS trixie
Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1003 for host es2028.codfw.wmnet with OS trixie completed:
- es2028 (WARN)
- Downtimed on Icinga/Alertmanager
- Disabled Puppet
- Removed from Puppet and PuppetDB if present and deleted any certificates
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- Host up (Debian installer)
- Add puppet_version metadata (7) to Debian installer
- Checked BIOS boot parameters are back to normal
- Host up (new fresh trixie OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga/Alertmanager
- Removed previous downtime on Alertmanager (old OS)
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202510301320_marostegui_1359934_es2028.out
- configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is not optimal, downtime not removed
- Updated Netbox data from PuppetDB
Change #1201996 had a related patch set uploaded (by Marostegui; author: Marostegui):
[operations/puppet@production] instances.yaml: Add es1033 to dbctl
Change #1201996 merged by Marostegui:
[operations/puppet@production] instances.yaml: Add es1033 to dbctl
Mentioned in SAL (#wikimedia-operations) [2025-11-05T07:16:06Z] <marostegui@cumin1003> dbctl commit (dc=all): 'Add es1033 to es2 depooled T409257 T407472', diff saved to https://phabricator.wikimedia.org/P84834 and previous config saved to /var/cache/conftool/dbconfig/20251105-071605-marostegui.json
Change #1202002 had a related patch set uploaded (by Marostegui; author: Marostegui):
[operations/puppet@production] es1033: Enable notifications
Change #1202002 merged by Marostegui:
[operations/puppet@production] es1033: Enable notifications
Change #1218652 had a related patch set uploaded (by Marostegui; author: Marostegui):
[operations/puppet@production] isntallserver: Do not format /srv on es2028
Change #1218652 merged by Marostegui:
[operations/puppet@production] isntallserver: Do not format /srv on es2028
Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1003 for host es2028.codfw.wmnet with OS trixie
Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1003 for host es2028.codfw.wmnet with OS trixie executed with errors:
- es2028 (FAIL)
- Downtimed on Icinga/Alertmanager
- Disabled Puppet
- Removed from Puppet and PuppetDB if present and deleted any certificates
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- The reimage failed, see the cookbook logs for the details. You can also try typing "sudo install-console es2028.codfw.wmnet" to get a root shell, but depending on the failure this may not work.
Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1003 for host es2028.codfw.wmnet with OS trixie
Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1003 for host es2028.codfw.wmnet with OS trixie executed with errors:
- es2028 (FAIL)
- Removed from Puppet and PuppetDB if present and deleted any certificates
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- The reimage failed, see the cookbook logs for the details. You can also try typing "sudo install-console es2028.codfw.wmnet" to get a root shell, but depending on the failure this may not work.
Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1003 for host es2028.codfw.wmnet with OS trixie
Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1003 for host es2028.codfw.wmnet with OS trixie executed with errors:
- es2028 (FAIL)
- Removed from Puppet and PuppetDB if present and deleted any certificates
- Removed from Debmonitor if present
- Forced UEFI HTTP Boot for next reboot
- Host rebooted via Redfish
- The reimage failed, see the cookbook logs for the details. You can also try typing "sudo install-console es2028.codfw.wmnet" to get a root shell, but depending on the failure this may not work.
Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1003 for host es2028.codfw.wmnet with OS trixie
Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1003 for host es2028.codfw.wmnet with OS trixie executed with errors:
- es2028 (FAIL)
- Removed from Puppet and PuppetDB if present and deleted any certificates
- Removed from Debmonitor if present
- Forced UEFI HTTP Boot for next reboot
- Host rebooted via Redfish
- Host up (Debian installer)
- Add puppet_version metadata (7) to Debian installer
- The reimage failed, see the cookbook logs for the details. You can also try typing "sudo install-console es2028.codfw.wmnet" to get a root shell, but depending on the failure this may not work.
Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1003 for host es2028.codfw.wmnet with OS trixie
Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1003 for host es2028.codfw.wmnet with OS trixie completed:
- es2028 (WARN)
- Removed from Puppet and PuppetDB if present and deleted any certificates
- Removed from Debmonitor if present
- Forced UEFI HTTP Boot for next reboot
- Host rebooted via Redfish
- Host up (Debian installer)
- Add puppet_version metadata (7) to Debian installer
- Host up (new fresh trixie OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga/Alertmanager
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202512220709_marostegui_3563158_es2028.out
- configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is not optimal, downtime not removed
- Updated Netbox data from PuppetDB