Description
Details
| Status | Subtype | Assigned | Task | ||
|---|---|---|---|---|---|
| Open | None | T291916 Tracking task for Bullseye migrations in production | |||
| Resolved | Dzahn | T327068 Bullseye upgrade for remaining Collab hosts | |||
| Resolved | Dzahn | T348392 Migrate planet servers to bullseye or bookworm | |||
| Resolved | Dzahn | T351849 Site: 2 VMs %request for planet |
Event Timeline
cookbook [GLOBAL_ARGS] sre.ganeti.makevm: error: argument --memory: Memory must be at least 1.5G
Oh really? Well then 1.5G. But we used to have VMs with 256MB, didnt we
sudo cookbook sre.ganeti.makevm --vcpus 1 --memory 1.5G ... .. error: argument --memory: invalid validate_memory value: '1.5G'
sudo cookbook sre.ganeti.makevm --vcpus 1 --memory 1.5 ... ... error: argument --memory: Memory must be at least 1.5G
Change 976858 had a related patch set uploaded (by Dzahn; author: Dzahn):
[operations/puppet@production] site: add planet[12]003 with insetup role
Change 976858 merged by Dzahn:
[operations/puppet@production] site: add planet[12]003 with insetup role
Cookbook cookbooks.sre.hosts.reimage was started by dzahn@cumin1001 for host planet1003.eqiad.wmnet with OS bookworm
Change 976867 had a related patch set uploaded (by Dzahn; author: Dzahn):
[operations/puppet@production] hieradata: set planet[12]003 to use puppet7
Change 976867 merged by Dzahn:
[operations/puppet@production] hieradata: set planet[12]003 to use puppet7
Cookbook cookbooks.sre.hosts.reimage started by dzahn@cumin1001 for host planet1003.eqiad.wmnet with OS bookworm executed with errors:
- planet1003 (FAIL)
- Removed from Puppet and PuppetDB if present and deleted any certificates
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via gnt-instance
- Host up (Debian installer)
- Add puppet_version metadata to Debian installer
- Set boot media to disk
- Host up (new fresh bookworm OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga/Alertmanager
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202311222119_dzahn_1639445_planet1003.out
- configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
- Rebooted
- The reimage failed, see the cookbook logs for the details
Cookbook cookbooks.sre.hosts.reimage was started by dzahn@cumin1001 for host planet1003.eqiad.wmnet with OS bookworm
Cookbook cookbooks.sre.hosts.reimage was started by dzahn@cumin1001 for host planet2003.codfw.wmnet with OS bookworm
Cookbook cookbooks.sre.hosts.reimage started by dzahn@cumin1001 for host planet1003.eqiad.wmnet with OS bookworm executed with errors:
- planet1003 (FAIL)
- Downtimed on Icinga/Alertmanager
- Disabled Puppet
- Removed from Puppet and PuppetDB if present and deleted any certificates
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via gnt-instance
- Host up (Debian installer)
- Add puppet_version metadata to Debian installer
- Set boot media to disk
- Host up (new fresh bookworm OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga/Alertmanager
- Removed previous downtime on Alertmanager (old OS)
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202311222224_dzahn_1679959_planet1003.out
- Unable to run puppet on config-master2001.codfw.wmnet,config-master1001.eqiad.wmnet to update configmaster.wikimedia.org with the new host SSH public key for wmf-update-known-hosts-production
- Rebooted
- The reimage failed, see the cookbook logs for the details
Cookbook cookbooks.sre.hosts.reimage was started by dzahn@cumin1001 for host planet1003.eqiad.wmnet with OS bookworm
Cookbook cookbooks.sre.hosts.reimage started by dzahn@cumin1001 for host planet2003.codfw.wmnet with OS bookworm executed with errors:
- planet2003 (FAIL)
- Removed from Puppet and PuppetDB if present and deleted any certificates
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via gnt-instance
- Host up (Debian installer)
- Add puppet_version metadata to Debian installer
- Set boot media to disk
- Host up (new fresh bookworm OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga/Alertmanager
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202311222305_dzahn_1693000_planet2003.out
- Unable to run puppet on config-master2001.codfw.wmnet,config-master1001.eqiad.wmnet to update configmaster.wikimedia.org with the new host SSH public key for wmf-update-known-hosts-production
- Rebooted
- The reimage failed, see the cookbook logs for the details
Cookbook cookbooks.sre.hosts.reimage was started by dzahn@cumin1001 for host planet2003.codfw.wmnet with OS bookworm
Cookbook cookbooks.sre.hosts.reimage started by dzahn@cumin1001 for host planet1003.eqiad.wmnet with OS bookworm executed with errors:
- planet1003 (FAIL)
- Downtimed on Icinga/Alertmanager
- Disabled Puppet
- Removed from Puppet and PuppetDB if present and deleted any certificates
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via gnt-instance
- Host up (Debian installer)
- Add puppet_version metadata to Debian installer
- Set boot media to disk
- Host up (new fresh bookworm OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga/Alertmanager
- Removed previous downtime on Alertmanager (old OS)
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202311230034_dzahn_1743638_planet1003.out
- configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
- Rebooted
- The reimage failed, see the cookbook logs for the details
Cookbook cookbooks.sre.hosts.reimage started by dzahn@cumin1001 for host planet2003.codfw.wmnet with OS bookworm executed with errors:
- planet2003 (FAIL)
- Downtimed on Icinga/Alertmanager
- Disabled Puppet
- Removed from Puppet and PuppetDB if present and deleted any certificates
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via gnt-instance
- Host up (Debian installer)
- Add puppet_version metadata to Debian installer
- Set boot media to disk
- Host up (new fresh bookworm OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga/Alertmanager
- Removed previous downtime on Alertmanager (old OS)
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202311230044_dzahn_1749349_planet2003.out
- configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
- Rebooted
- The reimage failed, see the cookbook logs for the details
Cookbook cookbooks.sre.hosts.reimage was started by dzahn@cumin1001 for host planet1003.eqiad.wmnet with OS bookworm
Cookbook cookbooks.sre.hosts.reimage started by dzahn@cumin1001 for host planet1003.eqiad.wmnet with OS bookworm completed:
- planet1003 (PASS)
- Downtimed on Icinga/Alertmanager
- Disabled Puppet
- Removed from Puppet and PuppetDB if present and deleted any certificates
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via gnt-instance
- Host up (Debian installer)
- Add puppet_version metadata to Debian installer
- Set boot media to disk
- Host up (new fresh bookworm OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga/Alertmanager
- Removed previous downtime on Alertmanager (old OS)
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202311271825_dzahn_476197_planet1003.out
- configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is optimal
- Icinga downtime removed
- Updated Netbox data from PuppetDB