Page MenuHomePhabricator

Repurpose three decom servers as temporary ganeti-test1001/1002 and ganeti-test2004
Closed, ResolvedPublic

Description

Tracking task to not forget about them, from T344598#9114384

Event Timeline

Change 954898 had a related patch set uploaded (by Ayounsi; author: Ayounsi):

[operations/puppet@production] Add temp ganeti-test hosts

https://gerrit.wikimedia.org/r/954898

Change 954898 merged by Ayounsi:

[operations/puppet@production] Add temp ganeti-test hosts

https://gerrit.wikimedia.org/r/954898

Cookbook cookbooks.sre.hosts.reimage was started by ayounsi@cumin1001 for host ganeti-test1001.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by ayounsi@cumin1001 for host ganeti-test1001.eqiad.wmnet with OS bullseye executed with errors:

  • ganeti-test1001 (FAIL)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Checked BIOS boot parameters are back to normal
    • The reimage failed, see the cookbook logs for the details

Cookbook cookbooks.sre.hosts.reimage was started by ayounsi@cumin1001 for host ganeti-test1001.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage was started by ayounsi@cumin1001 for host ganeti-test1002.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by ayounsi@cumin1001 for host ganeti-test1001.eqiad.wmnet with OS bullseye completed:

  • ganeti-test1001 (PASS)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202309051217_ayounsi_3956565_ganeti-test1001.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> active
    • The sre.puppet.sync-netbox-hiera cookbook was run successfully

Cookbook cookbooks.sre.hosts.reimage started by ayounsi@cumin1001 for host ganeti-test1002.eqiad.wmnet with OS bullseye completed:

  • ganeti-test1002 (PASS)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202309051247_ayounsi_3988529_ganeti-test1002.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> active
    • The sre.puppet.sync-netbox-hiera cookbook was run successfully

Change 954924 had a related patch set uploaded (by Ayounsi; author: Ayounsi):

[labs/private@master] Add mock TLS key for ganeti-test01.svc.eqiad.wmnet

https://gerrit.wikimedia.org/r/954924

Change 954924 merged by Ayounsi:

[labs/private@master] Add mock TLS key for ganeti-test01.svc.eqiad.wmnet

https://gerrit.wikimedia.org/r/954924

Change 954946 had a related patch set uploaded (by Ayounsi; author: Ayounsi):

[operations/puppet@production] Add ganeti-test01.svc.eqiad.wmnet public cert

https://gerrit.wikimedia.org/r/954946

Change 954946 merged by Ayounsi:

[operations/puppet@production] Add ganeti-test01.svc.eqiad.wmnet public cert

https://gerrit.wikimedia.org/r/954946

Change 954950 had a related patch set uploaded (by Ayounsi; author: Ayounsi):

[operations/puppet@production] ganeti-test100[12]: assign ganeti-test role

https://gerrit.wikimedia.org/r/954950

Change 954950 merged by Ayounsi:

[operations/puppet@production] ganeti-test100[12]: assign ganeti-test role

https://gerrit.wikimedia.org/r/954950

Change 955284 had a related patch set uploaded (by Ayounsi; author: Ayounsi):

[operations/puppet@production] eqiad ganeti test setup

https://gerrit.wikimedia.org/r/955284

Change 955284 merged by Ayounsi:

[operations/puppet@production] eqiad ganeti test setup

https://gerrit.wikimedia.org/r/955284

Change 955299 had a related patch set uploaded (by Ayounsi; author: Ayounsi):

[operations/dns@master] DNS add A for ganeti-test01.svc.eqiad.wmnet

https://gerrit.wikimedia.org/r/955299

Change 955299 merged by Ayounsi:

[operations/dns@master] DNS add A for ganeti-test01.svc.eqiad.wmnet

https://gerrit.wikimedia.org/r/955299

ayounsi renamed this task from Repurpose two decom servers as temporary ganeti-test1001/1002 to Repurpose three decom servers as temporary ganeti-test1001/1002 and ganeti-test2004.Sep 22 2023, 12:35 PM
ayounsi updated the task description. (Show Details)

Cookbook cookbooks.sre.hosts.reimage was started by jhancock@cumin2002 for host ganeti-test2004.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by jhancock@cumin2002 for host ganeti-test2004.codfw.wmnet with OS bullseye executed with errors:

  • ganeti-test2004 (FAIL)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Checked BIOS boot parameters are back to normal
    • The reimage failed, see the cookbook logs for the details

Cookbook cookbooks.sre.hosts.reimage was started by jhancock@cumin2002 for host ganeti-test2004.codfw.wmnet with OS bullseye

I'm still having trouble getting this ganeti-test2004 to behave. the mgmt and network addresses ping. the idrac is inaccessible through the web gui, but is through the root login. the idrac version is 2.83.83.83.

I tried to install the os anyway but it fails when it tries to set up a raid. there is no raid controller on this one. Are there any tips for fixing these problems? thanks ++ @Papaul

@Jhancock.wm first thing first you need to upgrade the idrac

Cookbook cookbooks.sre.hosts.reimage started by jhancock@cumin2002 for host ganeti-test2004.codfw.wmnet with OS bullseye executed with errors:

  • ganeti-test2004 (FAIL)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Checked BIOS boot parameters are back to normal
    • The reimage failed, see the cookbook logs for the details

Change 963998 had a related patch set uploaded (by Ayounsi; author: Ayounsi):

[operations/puppet@production] ganeti-test2004: add to Puppet

https://gerrit.wikimedia.org/r/963998

Change 963998 merged by Ayounsi:

[operations/puppet@production] ganeti-test2004: add to Puppet

https://gerrit.wikimedia.org/r/963998

Cookbook cookbooks.sre.hosts.reimage was started by ayounsi@cumin1001 for host ganeti-test2004.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by ayounsi@cumin1001 for host ganeti-test2004.codfw.wmnet with OS bullseye executed with errors:

  • ganeti-test2004 (FAIL)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Checked BIOS boot parameters are back to normal
    • The reimage failed, see the cookbook logs for the details

Cookbook cookbooks.sre.hosts.reimage was started by ayounsi@cumin1001 for host ganeti-test2004.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by ayounsi@cumin1001 for host ganeti-test2004.codfw.wmnet with OS bullseye executed with errors:

  • ganeti-test2004 (FAIL)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Checked BIOS boot parameters are back to normal
    • The reimage failed, see the cookbook logs for the details

Change 964036 had a related patch set uploaded (by Ayounsi; author: Ayounsi):

[operations/puppet@production] Change ganeti-test2004's role to ganeti_test

https://gerrit.wikimedia.org/r/964036

Change 964036 merged by Ayounsi:

[operations/puppet@production] Change ganeti-test2004's role to ganeti_test

https://gerrit.wikimedia.org/r/964036

cookbooks.sre.hosts.decommission executed by ayounsi@cumin1002 for hosts: ganeti-test[1001-1002].eqiad.wmnet

  • ganeti-test1001.eqiad.wmnet (PASS)
    • Downtimed host on Icinga/Alertmanager
    • Found physical host
    • Downtimed management interface on Alertmanager
    • Wiped all swraid, partition-table and filesystem signatures
    • Powered off
    • [Netbox] Set status to Decommissioning, deleted all non-mgmt IPs, updated switch interfaces (disabled, removed vlans, etc)
    • Configured the linked switch interface(s)
    • Removed from DebMonitor
    • Removed from Puppet master and PuppetDB
  • ganeti-test1002.eqiad.wmnet (PASS)
    • Downtimed host on Icinga/Alertmanager
    • Found physical host
    • Downtimed management interface on Alertmanager
    • Wiped all swraid, partition-table and filesystem signatures
    • Powered off
    • [Netbox] Set status to Decommissioning, deleted all non-mgmt IPs, updated switch interfaces (disabled, removed vlans, etc)
    • Configured the linked switch interface(s)
    • Removed from DebMonitor
    • Removed from Puppet master and PuppetDB

cookbooks.sre.hosts.decommission executed by ayounsi@cumin1002 for hosts: ganeti-test2004.codfw.wmnet

  • ganeti-test2004.codfw.wmnet (PASS)
    • Downtimed host on Icinga/Alertmanager
    • Found physical host
    • Downtimed management interface on Alertmanager
    • Wiped all swraid, partition-table and filesystem signatures
    • Powered off
    • [Netbox] Set status to Decommissioning, deleted all non-mgmt IPs, updated switch interfaces (disabled, removed vlans, etc)
    • Configured the linked switch interface(s)
    • Removed from DebMonitor
    • Removed from Puppet master and PuppetDB

Change 989212 had a related patch set uploaded (by Ayounsi; author: Ayounsi):

[operations/puppet@production] Remove mentions of ganeti-test1001/2 and 2004

https://gerrit.wikimedia.org/r/989212

ayounsi added projects: ops-eqiad, ops-codfw.

Those 3 servers have been decommissioned. Over to DCops to finish the process.

Thanks, they've been quite useful.

RobH claimed this task.
RobH subscribed.

T354680 and T354681 filed for decom

Change 989212 merged by Ayounsi:

[operations/puppet@production] Remove mentions of ganeti-test1001/2 and 2004

https://gerrit.wikimedia.org/r/989212