scandium.eqiad.wmnet is in the lab network and is only used to run a single zuul-merger instance. The reason we had it in labs is because that is an internal service that does not need to be reached from the internet.
The contint1001 (Jenkins/Zuul server) has a public IP address and labs instances will be able to reach it. Hence we should move the zuul-merger there which will let us phase out scandium entirely.
Stretch goal: refactor the puppet zuul:merger class so we can have several instance on a single server (will address T140297)
decommission steps
- - all system services confirmed offline from production use
- - set all icinga checks to maint mode/disabled while reclaim/decommmission takes place.
- - remove system from all lvs/pybal active configuration
- - any service group puppet/heira/dsh config removed
- - remove site.pp (replace with role::spare if system isn't shut down immediately during this process.)
START NON-INTERRUPPTABLE STEPS
- - disable puppet on host
- - remove all remaining puppet references (include role::spare)
- - power down host
- - disable switch port
- - remove production dns entries
- - puppet node clean, puppet node deactivate, salt key removed
END NON-INTERRUPPTABLE STEPS
- - system disks wiped by onsite
- - system unracked and decommissioned (by onsite), update racktables with result
- - switch port configration removed from asw-a-eqiad for scandium when it is unracked.