The server had memory, mainboard and power supply issues (T194172) and it out of warranty since January, decommission it.
This checklist is able to be copied and pasted into phabricator hardware request tasks for reclaiming systems to spare or decom.
- - all system services confirmed offline from production use
- - set all icinga checks to maint mode/disabled while reclaim/decommmission takes place.
- - remove system from all lvs/pybal active configuration
- - any service group puppet/heira/dsh config removed
- - replace with role(spare::system) in site.pp
START NON-INTERRUPPTABLE STEPS
- - disable puppet on host
- - power down host
- - update status in netbox (inventory for decom, planned for spare)
- - disable switch port
- - switch port assignment noted on this task (for later removal) aw-c-codfw:ge-4/0/37
- - remove all remaining puppet references (include role::spare)
- - remove production dns entries
- - puppet node clean, puppet node deactivate (handled by wmf-decommission-host)
- - remove dbmonitor entries on neodymium/sarin: sudo curl -X DELETE https://debmonitor.discovery.wmnet/hosts/${HOST_FQDN} --cert /etc/debmonitor/ssl/cert.pem --key /etc/debmonitor/ssl/server.key (handled by wmf-decommission-host)
END NON-INTERRUPPTABLE STEPS
- - system disks wiped (by onsite)
- - IF DECOM: system unracked and decommissioned (by onsite), update netbox with resulting removal from rack and change to 'offline' status
- - IF DECOM: switch port configration removed from switch once system is unracked.
- - IF DECOM: add system to decommission tracking google sheet
- - IF DECOM: mgmt dns entries removed.