This task will track the #decommission-hardware of server rdb100[56scb200[1234].eqiad.wmnet.
With the launch of updates to the decom cookbook, the majority of these steps can be handled by the service owners directly. The DC Ops team only gets involved once the system has been fully removed from service and powered down by the decommission cookbook.
**Steps for service owner:**
rdb1005scb2001
[] - all system services confirmed offline from production use
[] - set all icinga checks to maint mode/disabled while reclaim/decommmission takes place. (likely done by script)
[] - remove system from all lvs/pybal active configuration
[] - any service group puppet/hiera/dsh config removed
[] - remove site.pp, replace with role(spare::system) recommended to ensure services offline but not 100% required as long as the decom script is IMMEDIATELY run below.
[] - login to cumin host and run the decom cookbook: cookbook sre.hosts.decommission <host fqdn> -t <phab task>. This does: bootloader wipe, host power down, netbox update to decommissioning status, puppet node clean, puppet node deactivate, debmonitor removal, and run homer.
[] - remove all remaining puppet references and all host entries in the puppet repo
[] - reassign task from service owner to DC ops team member depending on site of server.
scb2002
rdb1006 - all system services confirmed offline from production use
- set all icinga checks to maint mode/disabled while reclaim/decommmission takes place. (likely done by script)
- remove system from all lvs/pybal active configuration
- any service group puppet/hiera/dsh config removed
- remove site.pp, replace with role(spare::system) recommended to ensure services offline but not 100% required as long as the decom script is IMMEDIATELY run below.
- login to cumin host and run the decom cookbook: cookbook sre.hosts.decommission <host fqdn> -t <phab task>. This does: bootloader wipe, host power down, netbox update to decommissioning status, puppet node clean, puppet node deactivate, debmonitor removal, and run homer.
- remove all remaining puppet references and all host entries in the puppet repo
- reassign task from service owner to DC ops team member depending on site of server.
scb2003
- all system services confirmed offline from production use
- set all icinga checks to maint mode/disabled while reclaim/decommmission takes place. (likely done by script)
- remove system from all lvs/pybal active configuration
- any service group puppet/hiera/dsh config removed
- remove site.pp, replace with role(spare::system) recommended to ensure services offline but not 100% required as long as the decom script is IMMEDIATELY run below.
- login to cumin host and run the decom cookbook: cookbook sre.hosts.decommission <host fqdn> -t <phab task>. This does: bootloader wipe, host power down, netbox update to decommissioning status, puppet node clean, puppet node deactivate, debmonitor removal, and run homer.
- remove all remaining puppet references and all host entries in the puppet repo
- reassign task from service owner to DC ops team member depending on site of server.
scb2004
[] - all system services confirmed offline from production use
- set all icinga checks to maint mode/disabled while reclaim/decommmission takes place. (likely done by script)
[] - set all icinga checks to maint mode/disabled while reclaim/decommmission takes place. (likely done by script)- remove system from all lvs/pybal active configuration
[] - remove system from all lvs/pybal active - any service group puppet/hiera/dsh configuration removed
[] - any service group puppet/hiera/dsh config removed - remove site.pp, replace with role(spare::system) recommended to ensure services offline but not 100% required as long as the decom script is IMMEDIATELY run below.
[] - remove site.pp - login to cumin host and run the decom cookbook: cookbook sre.hosts.decommission <host fqdn> -t <phab task>. This does: bootloader wipe, host power down, netbox update to decommissioning status, puppet node clean, puppet node deactivate, debmonitor removal, replace with role(spare::system) recommended to ensure services offline but not 100% required as long as the decom script is IMMEDIATELY run belowand run homer.
[] - login to cumin host and run the decom cookbook: cookbook sre.hosts.decommission <host fqdn> -t <phab task>. This does: bootloader wipe, host power down, netbox update to decommissioning status, puppet node clean, puppet node deactivate, debmonitor removal, and run homer.- remove all remaining puppet references and all host entries in the puppet repo
[] - remove all remaining puppet references and all host entries in the puppet repoassign task from service owner to DC ops team member depending on site of server.
[] - reassign task fromEnd service owner to DC ops team member depending on site of server.steps / Begin DC-Ops team steps:
**End service owner steps / Begin DC-Ops team steps:**scb2001
rdb1005 - system disks removed (by onsite)
- determine system age, under 5 years are reclaimed to spare, over 5 years are decommissioned.
- IF DECOM: system unracked and decommissioned (by onsite), update netbox with result and set state to offline
- IF DECOM: mgmt dns entries removed.
- IF RECLAIM: set netbox state to 'inventory' and hostname to asset tag
scb2002
[] - system disks removed (by onsite)
- determine system age, under 5 years are reclaimed to spare, over 5 years are decommissioned.
[] - determine system age, under 5 years are reclaimed to spare- IF DECOM: system unracked and decommissioned (by onsite), over 5 years are decommissioned.update netbox with result and set state to offline
[] - IF DECOM: system unracked and decommissioned (by onsite), update netbox with result and set state to offline- IF DECOM: mgmt dns entries removed.
[] - IF DECOM: mgmt dns entries removed.RECLAIM: set netbox state to 'inventory' and hostname to asset tag
[] - IF RECLAIM: set netbox state to 'inventory' and hostname to asset tagscb2003
rdb1006 - system disks removed (by onsite)
- determine system age, under 5 years are reclaimed to spare, over 5 years are decommissioned.
- IF DECOM: system unracked and decommissioned (by onsite), update netbox with result and set state to offline
- IF DECOM: mgmt dns entries removed.
- IF RECLAIM: set netbox state to 'inventory' and hostname to asset tag
scb2004
[] - system disks removed (by onsite)
[] - determine system age, under 5 years are reclaimed to spare, over 5 years are decommissioned.
[] - IF DECOM: system unracked and decommissioned (by onsite), update netbox with result and set state to offline
[] - IF DECOM: mgmt dns entries removed.
[] - IF RECLAIM: set netbox state to 'inventory' and hostname to asset tag