Bitu should be deployed to production using deb packages.
Staging will remain on Git deployments.
Description
Details
Event Timeline
Change 956836 had a related patch set uploaded (by Slyngshede; author: Slyngshede):
[operations/puppet@production] P:idm allow for installation via Debian packages.
Change 957669 had a related patch set uploaded (by Slyngshede; author: Slyngshede):
[operations/puppet@production] WIP: P:idm switch idm2001 to Debian package
Plan for testing rollout of Debian packages:
Upgrade test to Bookworm:
Pre-update:
- Set idm-test1001 in maintenance mode
- Merge patch: https://gerrit.wikimedia.org/r/c/operations/puppet/+/956836
- Verify that production remains running as expected
Reimage idm-test100
ssh cumin1001.eqiad.wmnet sudo cookbook sre.hosts.reimage --os bookworm -t T340721 idm-test1001
IDM2001 upgrade
- Disable Puppet on idm2001.wikimedia.org
- Reimage idm2001.wikimedia.org
- Merge patch: https://gerrit.wikimedia.org/r/c/operations/puppet/+/957669
ssh cumin1001.eqiad.wmnet sudo cumin 'idm2001.wikimedia.org' "disable-puppet 'bitu deb install - slyngshede'" sudo cookbook sre.hosts.reimage --os bookworm -t T340721 idm2001
Switch over to IDM2001
- Merge: https://gerrit.wikimedia.org/r/c/operations/dns/+/957674
- Follow DNS deployment: https://wikitech.wikimedia.org/wiki/DNS#Changing_records_in_a_zonefile
WAIT AND ALLOW ANY BUG TO REVEAL THEMSELVES
IDM1001 Upgrade
- Disable Puppet on idm1001.wikimedia.org
- Merge: https://gerrit.wikimedia.org/r/c/operations/puppet/+/957676
- Reimage idm1001.wikimedia.org
ssh cumin1001.eqiad.wmnet sudo cumin 'idm1001.wikimedia.org' "disable-puppet 'bitu deb install - slyngshede'" sudo cookbook sre.hosts.reimage --os bookworm -t T340721 idm1001
Change 957674 had a related patch set uploaded (by Slyngshede; author: Slyngshede):
[operations/dns@master] IDM Switchover
Change 957676 had a related patch set uploaded (by Slyngshede; author: Slyngshede):
[operations/puppet@production] IDM: Deploy deb to idm1001.
Change 956836 merged by Slyngshede:
[operations/puppet@production] P:idm allow for installation via Debian packages.
Cookbook cookbooks.sre.hosts.reimage was started by slyngshede@cumin1001 for host idm-test1001.wikimedia.org with OS bookworm
Cookbook cookbooks.sre.hosts.reimage started by slyngshede@cumin1001 for host idm-test1001.wikimedia.org with OS bookworm completed:
- idm-test1001 (WARN)
- Downtimed on Icinga/Alertmanager
- Disabled Puppet
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via gnt-instance
- Host up (Debian installer)
- Set boot media to disk
- Host up (new fresh bookworm OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga/Alertmanager
- Removed previous downtime on Alertmanager (old OS)
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202309141210_slyngshede_31369_idm-test1001.out
- configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is not optimal, downtime not removed
- Updated Netbox data from PuppetDB
Cookbook cookbooks.sre.hosts.reimage was started by slyngshede@cumin1001 for host idm-test1001.wikimedia.org with OS bookworm
Cookbook cookbooks.sre.hosts.reimage started by slyngshede@cumin1001 for host idm-test1001.wikimedia.org with OS bookworm completed:
- idm-test1001 (WARN)
- Downtimed on Icinga/Alertmanager
- Disabled Puppet
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via gnt-instance
- Host up (Debian installer)
- Set boot media to disk
- Host up (new fresh bookworm OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga/Alertmanager
- Removed previous downtime on Alertmanager (old OS)
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202309151319_slyngshede_376144_idm-test1001.out
- configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is not optimal, downtime not removed
- Updated Netbox data from PuppetDB
Change 957669 merged by Slyngshede:
[operations/puppet@production] P:idm switch idm2001 to Debian package
Cookbook cookbooks.sre.hosts.reimage was started by slyngshede@cumin1001 for host idm2001.wikimedia.org with OS bookworm
Cookbook cookbooks.sre.hosts.reimage started by slyngshede@cumin1001 for host idm2001.wikimedia.org with OS bookworm completed:
- idm2001 (WARN)
- Downtimed on Icinga/Alertmanager
- Disabled Puppet
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via gnt-instance
- Host up (Debian installer)
- Set boot media to disk
- Host up (new fresh bookworm OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga/Alertmanager
- Removed previous downtime on Alertmanager (old OS)
- First Puppet run failed and logged in /var/log/spicerack/sre/hosts/reimage/202309190829_slyngshede_3921709_idm2001.out, asking the operator what to do
- First Puppet run failed and logged in /var/log/spicerack/sre/hosts/reimage/202309190831_slyngshede_3921709_idm2001.out, asking the operator what to do
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202309190845_slyngshede_3921709_idm2001.out
- configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is optimal
- Icinga downtime removed
- Updated Netbox data from PuppetDB
Change 957676 merged by Slyngshede:
[operations/puppet@production] P:IDM: Failover Redis
Cookbook cookbooks.sre.hosts.reimage was started by slyngshede@cumin1001 for host idm1001.wikimedia.org with OS bookworm
Cookbook cookbooks.sre.hosts.reimage started by slyngshede@cumin1001 for host idm1001.wikimedia.org with OS bookworm completed:
- idm1001 (PASS)
- Downtimed on Icinga/Alertmanager
- Disabled Puppet
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via gnt-instance
- Host up (Debian installer)
- Set boot media to disk
- Host up (new fresh bookworm OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga/Alertmanager
- Removed previous downtime on Alertmanager (old OS)
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202309200724_slyngshede_4193245_idm1001.out
- configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is optimal
- Icinga downtime removed
- Updated Netbox data from PuppetDB
Cookbook cookbooks.sre.hosts.reimage was started by slyngshede@cumin1001 for host idm-test1001.wikimedia.org with OS bookworm
Cookbook cookbooks.sre.hosts.reimage started by slyngshede@cumin1001 for host idm-test1001.wikimedia.org with OS bookworm completed:
- idm-test1001 (PASS)
- Downtimed on Icinga/Alertmanager
- Disabled Puppet
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via gnt-instance
- Host up (Debian installer)
- Set boot media to disk
- Host up (new fresh bookworm OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga/Alertmanager
- Removed previous downtime on Alertmanager (old OS)
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202309201331_slyngshede_78977_idm-test1001.out
- configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is optimal
- Icinga downtime removed
- Updated Netbox data from PuppetDB