original task request
thorium has fully replaced stat1001. I have disabled active checks and notifications for this node in icinga, and enabled role spare::system. It should be good to disappear. Thanks!
addressing reclaim or decom
Since the system warranty expired on 2014-04-29, it is well out of warranty. All other hosts of this age are typically decommissioned as they fall out of use.
steps for decommission
- - remove system from all puppet/heira/active service use/add to role::spare - (otto handed this)
Please note that patchsets for all remaining steps cannot be merged until system is offline! Don't merge changes to remove public DNS if the system can come back online and then have an IP not tracked in our DNS config.
- remaining steps must all be done/merged in a short order/sequence. This is non-interrupt based tasking.
- - disable system from all icinga monitoring
- - power down system
- - disable system switch port (THIS MUST HAPPEN BEFORE THE REST HAPPENS)
- - remove salt/puppet entries (puppet node clean, puppet node deactive, salt-key -d)
- - remove puppet entries - https://gerrit.wikimedia.org/r/#/c/332470/
- - remove system production dns entries - https://gerrit.wikimedia.org/r/#/c/332472/
- - remove system site.pp entries (was in as role spare)
END NON-INTERRUPT STEPS (Once the system switch port is disabled, it stops it from calling in and introducing issues from not being active in puppet or having public dns entries. The remainder should be done in a timely fashion, but don't have to happen without interruption.
- - system disks wiped (by on-site dc ops)
- - system unracked, racktables updated
- - remove system switch port configuration
- - system mgmt dns entries removed
After all steps have been taken and system is fully decommissioned and removed from the rack, this can be resolved.