Details
| Status | Subtype | Assigned | Task | ||
|---|---|---|---|---|---|
| Resolved | Marostegui | T339185 Test MariaDB + Debian bookworm on databases | |||
| Resolved | Marostegui | T339835 Install Debian Bookworm on a DB |
Event Timeline
Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1001 for host db1124.eqiad.wmnet with OS bookworm
Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1001 for host db1124.eqiad.wmnet with OS bookworm executed with errors:
- db1124 (FAIL)
- Downtimed on Icinga/Alertmanager
- Disabled Puppet
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- Host up (Debian installer)
- Checked BIOS boot parameters are back to normal
- The reimage failed, see the cookbook logs for the details
Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1001 for host db1124.eqiad.wmnet with OS bullseye
Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1001 for host db1124.eqiad.wmnet with OS bullseye executed with errors:
- db1124 (FAIL)
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- Host up (Debian installer)
- Checked BIOS boot parameters are back to normal
- Host up (new fresh bullseye OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga/Alertmanager
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202306190753_marostegui_2592974_db1124.out
- configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
- The reimage failed, see the cookbook logs for the details
Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1001 for host db1124.eqiad.wmnet with OS bookworm
Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1001 for host db1124.eqiad.wmnet with OS bookworm executed with errors:
- db1124 (FAIL)
- Downtimed on Icinga/Alertmanager
- Disabled Puppet
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- Host up (Debian installer)
- Checked BIOS boot parameters are back to normal
- The reimage failed, see the cookbook logs for the details
Change 931232 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):
[operations/puppet@production] d-i: Fix retrieval of reuse-parts.sh for bookworm
Change 931232 merged by Muehlenhoff:
[operations/puppet@production] d-i: Fix retrieval of reuse-parts.sh for bookworm
Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1001 for host db1124.eqiad.wmnet with OS bookworm
Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1001 for host db1124.eqiad.wmnet with OS bookworm completed:
- db1124 (WARN)
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- Host up (Debian installer)
- Checked BIOS boot parameters are back to normal
- Host up (new fresh bookworm OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga/Alertmanager
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202306191004_marostegui_2619288_db1124.out
- configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is not optimal, downtime not removed
- Updated Netbox data from PuppetDB
Mentioned in SAL (#wikimedia-operations) [2023-06-19T12:21:00Z] <moritzm> uploaded wmfmariadbpy 0.10+deb12u1 T339835
For posterity, this got fixed with https://gerrit.wikimedia.org/r/c/operations/puppet/+/931232
Change 931493 had a related patch set uploaded (by Marostegui; author: Marostegui):
[operations/puppet@production] install_server: Reimage db1124
Change 931493 merged by Marostegui:
[operations/puppet@production] install_server: Reimage db1124
Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1001 for host db1119.eqiad.wmnet with OS bookworm
Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1001 for host db1119.eqiad.wmnet with OS bookworm executed with errors:
- db1119 (FAIL)
- Downtimed on Icinga/Alertmanager
- Disabled Puppet
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- Host up (Debian installer)
- The reimage failed, see the cookbook logs for the details
Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1001 for host db1119.eqiad.wmnet with OS bookworm
Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1001 for host db1119.eqiad.wmnet with OS bookworm completed:
- db1119 (WARN)
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- Host up (Debian installer)
- Checked BIOS boot parameters are back to normal
- Host up (new fresh bookworm OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga/Alertmanager
- First Puppet run failed and logged in /var/log/spicerack/sre/hosts/reimage/202306200729_marostegui_2855360_db1119.out, asking the operator what to do
- First Puppet run failed and logged in /var/log/spicerack/sre/hosts/reimage/202306200814_marostegui_2855360_db1119.out, asking the operator what to do
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202306200820_marostegui_2855360_db1119.out
- configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is not optimal, downtime not removed
- Updated Netbox data from PuppetDB
Change 953555 had a related patch set uploaded (by Marostegui; author: Marostegui):
[operations/puppet@production] mariadb: Move db1119 to s1
Change 953555 merged by Marostegui:
[operations/puppet@production] mariadb: Move db1119 to s1
Change 966524 had a related patch set uploaded (by Elukey; author: Elukey):
[operations/puppet@production] d-i: Fix retrieval of reuse-parts-test.sh for bookworm
Change 966524 merged by Elukey:
[operations/puppet@production] d-i: Fix retrieval of reuse-parts-test.sh for bookworm