Page MenuHomePhabricator

Decommission maps-test cluster
Closed, ResolvedPublic

Description

The maps-test servers are no longer used since we've moved to testing in the Beta Cluster (T172090), and should be removed.

Servers to be decommissioned:

  • maps-test2001.codfw.wmnet
  • maps-test2002.codfw.wmnet
  • maps-test2003.codfw.wmnet
  • maps-test2004.codfw.wmnet
  • - all system services confirmed offline from production use
  • - set all icinga checks to maint mode/disabled while reclaim/decommmission takes place.
  • - remove system from all lvs/pybal active configuration
  • - any service group puppet/heira/dsh config removed
  • - remove site.pp (replace with role(spare::system) if system isn't shut down immediately during this process.)

START NON-INTERRUPPTABLE STEPS

  • - disable puppet on host
  • - power down host
  • - disable switch port
  • - switch port assignment noted on this task (for later removal)
  • - remove all remaining puppet references (include role::spare)
  • - remove production dns entries
  • - puppet node clean, puppet node deactivate
  • - remove dbmonitor entries on neodymium/sarin: sudo curl -X DELETE https://debmonitor.discovery.wmnet/hosts/${HOST_FQDN} --cert /etc/debmonitor/ssl/cert.pem --key /etc/debmonitor/ssl/server.key

END NON-INTERRUPPTABLE STEPS

  • - system disks wiped (by onsite)
  • - IF DECOM: system unracked and decommissioned (by onsite), update netbox with result
  • - IF DECOM: switch port configration removed from switch once system is unracked.
  • - IF DECOM: add system to decommission tracking google sheet
  • - IF DECOM: mgmt dns entries removed.
  • - change netbox status to offline when unracked

Event Timeline

IMO this is lower priority than getting the production maps cluster updated to Stretch.

Gehel triaged this task as Low priority.Aug 27 2018, 3:08 PM

Change 460006 had a related patch set uploaded (by Gehel; owner: Gehel):
[operations/puppet@production] maps: decommission maps-test cluster

https://gerrit.wikimedia.org/r/460006

Change 460006 merged by Gehel:
[operations/puppet@production] maps: decommission maps-test cluster

https://gerrit.wikimedia.org/r/460006

Gehel updated the task description. (Show Details)
Gehel changed Risk Rating from N/A to default.

wmf-decommission-host was executed by robh for maps-test2001.codfw.wmnet and performed the following actions:

  • Revoked Puppet certificate
  • Removed from PuppetDB
  • Downtimed host on Icinga
  • Downtimed mgmt interface on Icinga
  • Removed from DebMonitor

wmf-decommission-host was executed by robh for maps-test2002.codfw.wmnet and performed the following actions:

  • Revoked Puppet certificate
  • Removed from PuppetDB
  • Downtimed host on Icinga
  • Downtimed mgmt interface on Icinga
  • Removed from DebMonitor

wmf-decommission-host was executed by robh for maps-test2003.codfw.wmnet and performed the following actions:

  • Revoked Puppet certificate
  • Removed from PuppetDB
  • Downtimed host on Icinga
  • Downtimed mgmt interface on Icinga
  • Removed from DebMonitor

wmf-decommission-host was executed by robh for maps-test2004.codfw.wmnet and performed the following actions:

  • Revoked Puppet certificate
  • Removed from PuppetDB
  • Downtimed host on Icinga
  • Downtimed mgmt interface on Icinga
  • Removed from DebMonitor
RobH added subscribers: Papaul, RobH.

All switch ports added to the disabled group. Once the disks are wiped and these are unracked, @Papaul can delete the description off of each of these switch ports and we'll be set.

@Papaul

To delte the switch port description after these are wiped and unracked, you can do the following (we'll use maps-test2001 as the example in commands):

  • login to asw-a-codfw
  • enter edit mode with following commands:
edit
edit interfaces
edit ge-5/0/3
delete description
top
show | compare

Then check out your change to ensure its correct, if so

commit comment T202898
RobH updated the task description. (Show Details)
RobH moved this task from Backlog to pending onsite steps (codfw) on the decommission-hardware board.
RobH moved this task from Backlog to Decommission on the ops-codfw board.

Change 461221 had a related patch set uploaded (by RobH; owner: RobH):
[operations/puppet@production] decom maps-test cluster in codfw

https://gerrit.wikimedia.org/r/461221

Change 461221 merged by RobH:
[operations/puppet@production] decom maps-test cluster in codfw

https://gerrit.wikimedia.org/r/461221

Change 461222 had a related patch set uploaded (by RobH; owner: RobH):
[operations/dns@master] decom maps-test cluser prod dns

https://gerrit.wikimedia.org/r/461222

Change 461222 merged by RobH:
[operations/dns@master] decom maps-test cluser prod dns

https://gerrit.wikimedia.org/r/461222

Removing maps from this ticket, since there isn't any work left on our side.

@RobH: I'll let you close it when done on your side.

RobH removed a project: Patch-For-Review.

Ok, this was neglected. This is now ready for @Papaul to sercure erase the disks on all 4 maps-test systems.

Change 508896 had a related patch set uploaded (by Papaul; owner: Papaul):
[operations/dns@master] DNS: Remove mgmt DNS for maps-test200[1-4]

https://gerrit.wikimedia.org/r/508896

Change 508896 merged by Dzahn:
[operations/dns@master] DNS: Remove mgmt DNS for maps-test200[1-4]

https://gerrit.wikimedia.org/r/508896