Page MenuHomePhabricator

decommission db2039
Closed, ResolvedPublicRequest

Description

This task will track the decommission-hardware of server db2039.

The first 5 steps should be completed by the service owner that is returning the server to DC-ops (for reclaim to spare or decommissioning, dependent on server configuration and age.)

db2039

Steps for service owner:

Steps for DC-Ops:

The following steps cannot be interrupted, as it will leave the system in an unfinished state.

Start non-interrupt steps:

  • - disable puppet on host
  • - power down host
  • - update netbox status to decommissioning (if decom) or Planned (if spare)
  • - disable switch port
  • - switch port assignment noted on this task (for later removal) - asw-d-codfw:ge-1/0/14
  • - remove all remaining puppet references (include role::spare)
  • - remove production dns entries
  • - puppet node clean, puppet node deactivate (handled by wmf-decommission-host)
  • - remove dbmonitor entries on neodymium/sarin: sudo curl -X DELETE https://debmonitor.discovery.wmnet/hosts/${HOST_FQDN} --cert /etc/debmonitor/ssl/cert.pem --key /etc/debmonitor/ssl/server.key (handled by wmf-decommission-host)

End non-interrupt steps.

  • - Label disk #3 as broken so it doesn't get re-used T226155
  • - system disks wiped (by onsite)
  • - IF DECOM: system unracked and decommissioned (by onsite), update netbox with status offline (unracked)
  • - IF DECOM: switch port configration removed from switch once system is unracked.
  • - IF DECOM: add system to decommission tracking google sheet
  • - IF DECOM: mgmt dns entries removed.

Event Timeline

Mentioned in SAL (#wikimedia-operations) [2019-06-18T08:16:04Z] <marostegui> Remove db2039 from tendril and zarcillo - T225988

Change 517601 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] mariadb: Set db2039 to spare

https://gerrit.wikimedia.org/r/517601

Change 517601 merged by Marostegui:
[operations/puppet@production] mariadb: Set db2039 to spare

https://gerrit.wikimedia.org/r/517601

Marostegui updated the task description. (Show Details)
Marostegui added a subscriber: Papaul.

This host is ready for DC-Ops to take over and decommission

Marostegui updated the task description. (Show Details)

Please mark disk #3 as broken so it doesn't get re-used T226155: Degraded RAID on db2039

cookbooks.sre.hosts.decommission executed by robh@cumin1001 for hosts: db2039.codfw.wmnet

  • db2039.codfw.wmnet
    • Removed from Puppet master and PuppetDB
    • Downtimed host on Icinga
    • Downtimed management interface on Icinga
    • Removed from DebMonitor

Change 519473 had a related patch set uploaded (by RobH; owner: RobH):
[operations/puppet@production] decom db2039

https://gerrit.wikimedia.org/r/519473

Change 519474 had a related patch set uploaded (by RobH; owner: RobH):
[operations/dns@master] decom db2039 prod dns

https://gerrit.wikimedia.org/r/519474

Change 519474 merged by RobH:
[operations/dns@master] decom db2039 prod dns

https://gerrit.wikimedia.org/r/519474

Change 519473 merged by RobH:
[operations/puppet@production] decom db2039

https://gerrit.wikimedia.org/r/519473

RobH updated the task description. (Show Details)
RobH removed a project: Patch-For-Review.
RobH moved this task from Backlog to Decommission on the ops-codfw board.

Change 524282 had a related patch set uploaded (by Papaul; owner: Papaul):
[operations/dns@master] DNS: Remove mgmt DNS for db2039

https://gerrit.wikimedia.org/r/524282

Change 524282 merged by Dzahn:
[operations/dns@master] DNS: Remove mgmt DNS for db2039

https://gerrit.wikimedia.org/r/524282

This is complete