There is new hardware here to replace cloudgw2002-dev.
Description
Details
| Status | Subtype | Assigned | Task | ||
|---|---|---|---|---|---|
| Unknown Object (Task) | |||||
| Resolved | Andrew | T418765 cloudgw2004-dev service implementation |
Event Timeline
Change #1248004 had a related patch set uploaded (by Majavah; author: Majavah):
[operations/puppet@production] site: Use nftables insetup role for cloudgw2004-dev
Change #1248004 merged by Andrew Bogott:
[operations/puppet@production] site: Use nftables insetup role for cloudgw2004-dev
Cookbook cookbooks.sre.hosts.reimage was started by andrew@cumin2002 for host cloudgw2004-dev.codfw.wmnet with OS trixie
Cookbook cookbooks.sre.hosts.reimage started by andrew@cumin2002 for host cloudgw2004-dev.codfw.wmnet with OS trixie executed with errors:
- cloudgw2004-dev (FAIL)
- Downtimed on Icinga/Alertmanager
- Disabled Puppet
- Removed from Puppet and PuppetDB if present and deleted any certificates
- Removed from Debmonitor if present
- Forced UEFI HTTP Boot for next reboot
- Host rebooted via Redfish
- Host up (Debian installer)
- Host up (new fresh trixie OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga/Alertmanager
- Removed previous downtime on Alertmanager (old OS)
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202603092108_andrew_964969_cloudgw2004-dev.out
- configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
- The reimage failed, see the cookbook logs for the details. You can also try typing "sudo install-console cloudgw2004-dev.codfw.wmnet" to get a root shell, but depending on the failure this may not work.
Cookbook cookbooks.sre.hosts.reimage was started by andrew@cumin2002 for host cloudgw2004-dev.codfw.wmnet with OS trixie
Cookbook cookbooks.sre.hosts.reimage started by andrew@cumin2002 for host cloudgw2004-dev.codfw.wmnet with OS trixie completed:
- cloudgw2004-dev (WARN)
- Downtimed on Icinga/Alertmanager
- Unable to disable Puppet, the host may have been unreachable
- Removed from Puppet and PuppetDB if present and deleted any certificates
- Removed from Debmonitor if present
- Forced UEFI HTTP Boot for next reboot
- Host rebooted via Redfish
- Host up (Debian installer)
- Host up (new fresh trixie OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga/Alertmanager
- Removed previous downtime on Alertmanager (old OS)
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202603092144_andrew_975213_cloudgw2004-dev.out
- configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is optimal
- Icinga downtime removed
- Updated Netbox data from PuppetDB
Change #1250638 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):
[operations/puppet@production] Replace cloudgw2002-dev with cloudgw2004-dev
Change #1250638 merged by Andrew Bogott:
[operations/puppet@production] Replace cloudgw2002-dev with cloudgw2004-dev