The following two hosts are in x1 and are partially broken, BBUs broken on db2033. {T184888} and db2034 has had a long history of HW issues {T150233} {T149553}
They should be decommissioned.
db2033 is a slave ready for DCOps to decommission {T220070}
db2034 is a master and should be decommissioned once the new host for x1 arrives (pending procurement)
=db2034=
== Decommission Checklist ==
[x] - all system services confirmed offline from production use - should be done by #DBA team
[x] - set all icinga checks to maint mode/disabled while reclaim/decommmission takes place. https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/510074/
[x] - remove system from all lvs/pybal active configuration - should be done by #DBA team: https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/510083/
[x] - any service group puppet/heira/dsh config removed - should be done by #DBA team
[x] - remove site.pp (replace with role(spare::system) if system isn't shut down immediately during this process.) - should be done by #DBA team: https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/510074/
START NON-INTERRUPPTABLE STEPS - please assign to @robh for the non-interrupt steps
[] - disable puppet on host
[] - power down host
[] - update status in netbox (inventory for decom, planned for spare)
[] - disable switch port
[] - switch port assignment noted on this task (for later removal)
[] - remove all remaining puppet references (include role::spare)
[] - remove production dns entries
[] - puppet node clean, puppet node deactivate (handled by wmf-decommission-host)
[] - remove dbmonitor entries on neodymium/sarin: sudo curl -X DELETE https://debmonitor.discovery.wmnet/hosts/${HOST_FQDN} --cert /etc/debmonitor/ssl/cert.pem --key /etc/debmonitor/ssl/server.key (handled by wmf-decommission-host)
END NON-INTERRUPPTABLE STEPS
[] - system disks wiped (by onsite) use hdparm for ssds and wipe for hdds
[] - Label the BBU as broken so it doesn't get re-used
[] - IF DECOM: system unracked and decommissioned (by onsite), update racktables with result
[] - IF DECOM: switch port configration removed from switch once system is unracked.
[] - IF DECOM: add system to decommission tracking google sheet
[] - IF DECOM: mgmt dns entries removed.