See parent task T289135 for more details
Description
Description
Event Timeline
Comment Actions
Mentioned in SAL (#wikimedia-operations) [2022-05-19T19:58:16Z] <inflatador> bking@relforge1004: banned relforge1003 from main and alpha clusters in preparation for reimage T308770
Comment Actions
Cookbook cookbooks.sre.hosts.reimage was started by bking@cumin1001 for host relforge1003.eqiad.wmnet with OS bullseye
Comment Actions
Cookbook cookbooks.sre.hosts.reimage started by bking@cumin1001 for host relforge1003.eqiad.wmnet with OS bullseye completed:
- relforge1003 (WARN)
- Downtimed on Icinga/Alertmanager
- Disabled Puppet
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- Host up (Debian installer)
- Host up (new fresh bullseye OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga/Alertmanager
- Removed previous downtime on Alertmanager (old OS)
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202205232139_bking_1173818_relforge1003.out
- Checked BIOS boot parameters are back to normal
- configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is not optimal, downtime not removed
- Updated Netbox data from PuppetDB
Comment Actions
Cookbook cookbooks.sre.hosts.reimage was started by ryankemper@cumin1001 for host relforge1004.eqiad.wmnet with OS bullseye
Comment Actions
Cookbook cookbooks.sre.hosts.reimage started by ryankemper@cumin1001 for host relforge1004.eqiad.wmnet with OS bullseye completed:
- relforge1004 (WARN)
- Downtimed on Icinga/Alertmanager
- Disabled Puppet
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- Host up (Debian installer)
- Host up (new fresh bullseye OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga/Alertmanager
- Removed previous downtime on Alertmanager (old OS)
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202205240106_ryankemper_1208841_relforge1004.out
- Checked BIOS boot parameters are back to normal
- configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is not optimal, downtime not removed
- Updated Netbox data from PuppetDB