Tracking task to not forget about them, from T344598#9114384
Description
Details
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | • ayounsi | T300152 Investigate Ganeti in routed mode | |||
Resolved | RobH | T345602 Repurpose three decom servers as temporary ganeti-test1001/1002 and ganeti-test2004 | |||
Resolved | Request | VRiley-WMF | T354680 decommission ganeti-test1001, ganeti-test1002 | ||
Resolved | Request | Jhancock.wm | T354681 decommission ganeti-test2004 |
Event Timeline
Change 954898 had a related patch set uploaded (by Ayounsi; author: Ayounsi):
[operations/puppet@production] Add temp ganeti-test hosts
Change 954898 merged by Ayounsi:
[operations/puppet@production] Add temp ganeti-test hosts
Cookbook cookbooks.sre.hosts.reimage was started by ayounsi@cumin1001 for host ganeti-test1001.eqiad.wmnet with OS bullseye
Cookbook cookbooks.sre.hosts.reimage started by ayounsi@cumin1001 for host ganeti-test1001.eqiad.wmnet with OS bullseye executed with errors:
- ganeti-test1001 (FAIL)
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- Host up (Debian installer)
- Checked BIOS boot parameters are back to normal
- The reimage failed, see the cookbook logs for the details
Cookbook cookbooks.sre.hosts.reimage was started by ayounsi@cumin1001 for host ganeti-test1001.eqiad.wmnet with OS bullseye
Cookbook cookbooks.sre.hosts.reimage was started by ayounsi@cumin1001 for host ganeti-test1002.eqiad.wmnet with OS bullseye
Cookbook cookbooks.sre.hosts.reimage started by ayounsi@cumin1001 for host ganeti-test1001.eqiad.wmnet with OS bullseye completed:
- ganeti-test1001 (PASS)
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- Host up (Debian installer)
- Checked BIOS boot parameters are back to normal
- Host up (new fresh bullseye OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga/Alertmanager
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202309051217_ayounsi_3956565_ganeti-test1001.out
- configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is optimal
- Icinga downtime removed
- Updated Netbox data from PuppetDB
- Updated Netbox status planned -> active
- The sre.puppet.sync-netbox-hiera cookbook was run successfully
Cookbook cookbooks.sre.hosts.reimage started by ayounsi@cumin1001 for host ganeti-test1002.eqiad.wmnet with OS bullseye completed:
- ganeti-test1002 (PASS)
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- Host up (Debian installer)
- Checked BIOS boot parameters are back to normal
- Host up (new fresh bullseye OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga/Alertmanager
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202309051247_ayounsi_3988529_ganeti-test1002.out
- configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is optimal
- Icinga downtime removed
- Updated Netbox data from PuppetDB
- Updated Netbox status planned -> active
- The sre.puppet.sync-netbox-hiera cookbook was run successfully
Change 954924 had a related patch set uploaded (by Ayounsi; author: Ayounsi):
[labs/private@master] Add mock TLS key for ganeti-test01.svc.eqiad.wmnet
Change 954924 merged by Ayounsi:
[labs/private@master] Add mock TLS key for ganeti-test01.svc.eqiad.wmnet
Change 954946 had a related patch set uploaded (by Ayounsi; author: Ayounsi):
[operations/puppet@production] Add ganeti-test01.svc.eqiad.wmnet public cert
Change 954946 merged by Ayounsi:
[operations/puppet@production] Add ganeti-test01.svc.eqiad.wmnet public cert
Change 954950 had a related patch set uploaded (by Ayounsi; author: Ayounsi):
[operations/puppet@production] ganeti-test100[12]: assign ganeti-test role
Change 954950 merged by Ayounsi:
[operations/puppet@production] ganeti-test100[12]: assign ganeti-test role
Change 955284 had a related patch set uploaded (by Ayounsi; author: Ayounsi):
[operations/puppet@production] eqiad ganeti test setup
Change 955284 merged by Ayounsi:
[operations/puppet@production] eqiad ganeti test setup
Change 955299 had a related patch set uploaded (by Ayounsi; author: Ayounsi):
[operations/dns@master] DNS add A for ganeti-test01.svc.eqiad.wmnet
Change 955299 merged by Ayounsi:
[operations/dns@master] DNS add A for ganeti-test01.svc.eqiad.wmnet
Cookbook cookbooks.sre.hosts.reimage was started by jhancock@cumin2002 for host ganeti-test2004.codfw.wmnet with OS bullseye
Cookbook cookbooks.sre.hosts.reimage started by jhancock@cumin2002 for host ganeti-test2004.codfw.wmnet with OS bullseye executed with errors:
- ganeti-test2004 (FAIL)
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- Host up (Debian installer)
- Checked BIOS boot parameters are back to normal
- The reimage failed, see the cookbook logs for the details
Cookbook cookbooks.sre.hosts.reimage was started by jhancock@cumin2002 for host ganeti-test2004.codfw.wmnet with OS bullseye
I'm still having trouble getting this ganeti-test2004 to behave. the mgmt and network addresses ping. the idrac is inaccessible through the web gui, but is through the root login. the idrac version is 2.83.83.83.
I tried to install the os anyway but it fails when it tries to set up a raid. there is no raid controller on this one. Are there any tips for fixing these problems? thanks ++ @Papaul
Cookbook cookbooks.sre.hosts.reimage started by jhancock@cumin2002 for host ganeti-test2004.codfw.wmnet with OS bullseye executed with errors:
- ganeti-test2004 (FAIL)
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- Host up (Debian installer)
- Checked BIOS boot parameters are back to normal
- The reimage failed, see the cookbook logs for the details
Change 963998 had a related patch set uploaded (by Ayounsi; author: Ayounsi):
[operations/puppet@production] ganeti-test2004: add to Puppet
Change 963998 merged by Ayounsi:
[operations/puppet@production] ganeti-test2004: add to Puppet
Cookbook cookbooks.sre.hosts.reimage was started by ayounsi@cumin1001 for host ganeti-test2004.codfw.wmnet with OS bullseye
Cookbook cookbooks.sre.hosts.reimage started by ayounsi@cumin1001 for host ganeti-test2004.codfw.wmnet with OS bullseye executed with errors:
- ganeti-test2004 (FAIL)
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- Host up (Debian installer)
- Checked BIOS boot parameters are back to normal
- The reimage failed, see the cookbook logs for the details
Cookbook cookbooks.sre.hosts.reimage was started by ayounsi@cumin1001 for host ganeti-test2004.codfw.wmnet with OS bullseye
Cookbook cookbooks.sre.hosts.reimage started by ayounsi@cumin1001 for host ganeti-test2004.codfw.wmnet with OS bullseye executed with errors:
- ganeti-test2004 (FAIL)
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- Host up (Debian installer)
- Checked BIOS boot parameters are back to normal
- The reimage failed, see the cookbook logs for the details
Change 964036 had a related patch set uploaded (by Ayounsi; author: Ayounsi):
[operations/puppet@production] Change ganeti-test2004's role to ganeti_test
Change 964036 merged by Ayounsi:
[operations/puppet@production] Change ganeti-test2004's role to ganeti_test
cookbooks.sre.hosts.decommission executed by ayounsi@cumin1002 for hosts: ganeti-test[1001-1002].eqiad.wmnet
- ganeti-test1001.eqiad.wmnet (PASS)
- Downtimed host on Icinga/Alertmanager
- Found physical host
- Downtimed management interface on Alertmanager
- Wiped all swraid, partition-table and filesystem signatures
- Powered off
- [Netbox] Set status to Decommissioning, deleted all non-mgmt IPs, updated switch interfaces (disabled, removed vlans, etc)
- Configured the linked switch interface(s)
- Removed from DebMonitor
- Removed from Puppet master and PuppetDB
- ganeti-test1002.eqiad.wmnet (PASS)
- Downtimed host on Icinga/Alertmanager
- Found physical host
- Downtimed management interface on Alertmanager
- Wiped all swraid, partition-table and filesystem signatures
- Powered off
- [Netbox] Set status to Decommissioning, deleted all non-mgmt IPs, updated switch interfaces (disabled, removed vlans, etc)
- Configured the linked switch interface(s)
- Removed from DebMonitor
- Removed from Puppet master and PuppetDB
cookbooks.sre.hosts.decommission executed by ayounsi@cumin1002 for hosts: ganeti-test2004.codfw.wmnet
- ganeti-test2004.codfw.wmnet (PASS)
- Downtimed host on Icinga/Alertmanager
- Found physical host
- Downtimed management interface on Alertmanager
- Wiped all swraid, partition-table and filesystem signatures
- Powered off
- [Netbox] Set status to Decommissioning, deleted all non-mgmt IPs, updated switch interfaces (disabled, removed vlans, etc)
- Configured the linked switch interface(s)
- Removed from DebMonitor
- Removed from Puppet master and PuppetDB
Change 989212 had a related patch set uploaded (by Ayounsi; author: Ayounsi):
[operations/puppet@production] Remove mentions of ganeti-test1001/2 and 2004
Those 3 servers have been decommissioned. Over to DCops to finish the process.
Thanks, they've been quite useful.
Change 989212 merged by Ayounsi:
[operations/puppet@production] Remove mentions of ganeti-test1001/2 and 2004