Page MenuHomePhabricator

Decommission db2033
Closed, ResolvedPublic

Description

db2033 is ready for DCOps to take over

db2033

Decommission Checklist

  • - all system services confirmed offline from production use - should be done by DBA team
  • - set all icinga checks to maint mode/disabled while reclaim/decommmission takes place.
  • - remove system from all lvs/pybal active configuration - should be done by DBA team:
  • - any service group puppet/heira/dsh config removed - should be done by DBA team
  • - remove site.pp (replace with role(spare::system) if system isn't shut down immediately during this process.) - should be done by DBA team: https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/501136/

START NON-INTERRUPPTABLE STEPS - please assign to @RobH for the non-interrupt steps

  • - disable puppet on host
  • - power down host
  • - update status in netbox (inventory for decom, planned for spare)
  • - disable switch port
  • - switch port assignment noted on this task (for later removal) - asw-c-codfw:ge-6/0/0
  • - remove all remaining puppet references (include role::spare)
  • - remove production dns entries
  • - puppet node clean, puppet node deactivate (handled by wmf-decommission-host)
  • - remove dbmonitor entries on neodymium/sarin: sudo curl -X DELETE https://debmonitor.discovery.wmnet/hosts/${HOST_FQDN} --cert /etc/debmonitor/ssl/cert.pem --key /etc/debmonitor/ssl/server.key (handled by wmf-decommission-host)

END NON-INTERRUPPTABLE STEPS

  • - system disks wiped (by onsite) use hdparm for ssds and wipe for hdds
  • - Label the BBU as broken so it doesn't get re-used
  • - Label disk #10 as broken so it doesn't get re-used [T220074]
  • - IF DECOM: system unracked and decommissioned (by onsite), update Netbox with result
  • - IF DECOM: switch port configration removed from switch once system is unracked.
  • - IF DECOM: add system to decommission tracking google sheet
  • - IF DECOM: mgmt dns entries removed.

Event Timeline

cookbooks.sre.hosts.decommission executed by robh@cumin1001 for hosts: db2033.codfw.wmnet

  • db2033.codfw.wmnet
    • Removed from Puppet master and PuppetDB
    • Downtimed host on Icinga
    • Downtimed management interface on Icinga
    • Removed from DebMonitor

Change 506012 had a related patch set uploaded (by RobH; owner: RobH):
[operations/puppet@production] db2033 decom

https://gerrit.wikimedia.org/r/506012

Change 506012 merged by RobH:
[operations/puppet@production] db2033 decom

https://gerrit.wikimedia.org/r/506012

Change 506014 had a related patch set uploaded (by RobH; owner: RobH):
[operations/dns@master] decom 2033 prod dns

https://gerrit.wikimedia.org/r/506014

Change 506014 merged by RobH:
[operations/dns@master] decom 2033 prod dns

https://gerrit.wikimedia.org/r/506014

RobH removed a project: Patch-For-Review.
RobH updated the task description. (Show Details)

Change 506416 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] db2033.yaml: Remove file

https://gerrit.wikimedia.org/r/506416

Change 506466 had a related patch set uploaded (by Papaul; owner: Papaul):
[operations/dns@master] DNS: Remove mgmt DNS for db2033

https://gerrit.wikimedia.org/r/506466

Change 506466 merged by Dzahn:
[operations/dns@master] DNS: Remove mgmt DNS for db2033

https://gerrit.wikimedia.org/r/506466

Change 506416 merged by Dzahn:
[operations/puppet@production] db2033.yaml: Remove file

https://gerrit.wikimedia.org/r/506416