Page MenuHomePhabricator

decommission labvirt101[01].eqiad.wmnet (Dec 2018 lease return)
Closed, ResolvedPublic

Description

These servers are leased, so they need to be shut down and... returned?

labvirt1010:

  • - all system services confirmed offline from production use
  • - set all icinga checks to maint mode/disabled while reclaim/decommmission takes place.
  • - remove system from all lvs/pybal active configuration
  • - any service group puppet/hiera/dsh config removed
  • - remove site.pp (replace with role::spare::system if system isn't shut down immediately during this process.)

START NON-INTERRUPPTABLE STEPS

  • - disable puppet on host
  • - power down host
  • - disable switch port
  • - switch port assignment noted on this task (for later removal)
  • - remove all remaining puppet references (include role::spare)
  • - remove production dns entries
  • - puppet node clean, puppet node deactivate (handled by wmf-decommission-host)
  • - remove dbmonitor entries on neodymium/sarin: sudo curl -X DELETE https://debmonitor.discovery.wmnet/hosts/${HOST_FQDN} --cert /etc/debmonitor/ssl/cert.pem --key /etc/debmonitor/ssl/server.key (handled by wmf-decommission-host)

END NON-INTERRUPPTABLE STEPS

  • - system disks wiped (by onsite)
  • - system unracked and decommissioned (by onsite), update Netbox with result
  • - system set aside for lease return
  • - switch port configration removed from switch once system is unracked.
  • - mgmt dns entries removed.

labvirt1011:

  • - all system services confirmed offline from production use
  • - set all icinga checks to maint mode/disabled while reclaim/decommmission takes place.
  • - remove system from all lvs/pybal active configuration
  • - any service group puppet/hiera/dsh config removed
  • - remove site.pp (replace with role::spare::system if system isn't shut down immediately during this process.)

START NON-INTERRUPPTABLE STEPS

  • - disable puppet on host
  • - power down host
  • - disable switch port
  • - switch port assignment noted on this task (for later removal)
  • - remove all remaining puppet references (include role::spare)
  • - remove production dns entries
  • - puppet node clean, puppet node deactivate (handled by wmf-decommission-host)
  • - remove dbmonitor entries on neodymium/sarin: sudo curl -X DELETE https://debmonitor.discovery.wmnet/hosts/${HOST_FQDN} --cert /etc/debmonitor/ssl/cert.pem --key /etc/debmonitor/ssl/server.key (handled by wmf-decommission-host)

END NON-INTERRUPPTABLE STEPS

  • - system disks wiped (by onsite)
  • - system unracked and decommissioned (by onsite), update Netbox with result
  • - system set aside for lease return
  • - switch port configration removed from switch once system is unracked.
  • - mgmt dns entries removed.

Event Timeline

Andrew created this task.Nov 29 2018, 2:57 PM
Restricted Application added a project: Operations. · View Herald TranscriptNov 29 2018, 2:57 PM

Change 476522 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] Move labvirt1010/1011 to role::spare

https://gerrit.wikimedia.org/r/476522

Change 476522 merged by Andrew Bogott:
[operations/puppet@production] Move labvirt1010/1011 to role::spare

https://gerrit.wikimedia.org/r/476522

Andrew assigned this task to RobH.Nov 29 2018, 3:15 PM
Andrew updated the task description. (Show Details)
RobH updated the task description. (Show Details)Nov 29 2018, 4:51 PM
RobH renamed this task from Decom/return labvirt1010 and 1011 to decommission (lease return) labvirt101[01].eqiad.wmnet.Nov 29 2018, 4:57 PM
RobH triaged this task as High priority.
RobH removed a project: Patch-For-Review.
RobH updated the task description. (Show Details)
RobH added a comment.Nov 29 2018, 5:03 PM

Switch ports on asw2-b-eqiad:

robh@asw2-b-eqiad> show interfaces descriptions | grep labvirt1010 
ge-3/0/14       up    up   labvirt1010 eth0
ge-3/0/15       up    up   labvirt1010 eth1

{master:2}
robh@asw2-b-eqiad> show interfaces descriptions | grep labvirt1011    
ge-3/0/16       up    up   labvirt1011 eth0
ge-3/0/17       up    up   labvirt1011 eth1
RobH updated the task description. (Show Details)Nov 29 2018, 5:05 PM

wmf-decommission-host was executed by robh for labvirt1010.eqiad.wmnet and performed the following actions:

  • Revoked Puppet certificate
  • Removed from PuppetDB
  • Downtimed host on Icinga
  • Downtimed mgmt interface on Icinga
  • Removed from DebMonitor

wmf-decommission-host was executed by robh for labvirt1011.eqiad.wmnet and performed the following actions:

  • Revoked Puppet certificate
  • Removed from PuppetDB
  • Downtimed host on Icinga
  • Downtimed mgmt interface on Icinga
  • Removed from DebMonitor
RobH updated the task description. (Show Details)Nov 29 2018, 5:10 PM

Change 476570 had a related patch set uploaded (by RobH; owner: RobH):
[operations/puppet@production] removing references to decom servers labvirt101[01]

https://gerrit.wikimedia.org/r/476570

Change 476570 merged by RobH:
[operations/puppet@production] removing references to decom servers labvirt101[01]

https://gerrit.wikimedia.org/r/476570

RobH reassigned this task from RobH to Cmjohnson.Nov 29 2018, 5:19 PM
RobH removed a project: Patch-For-Review.
RobH updated the task description. (Show Details)
RobH moved this task from Backlog to Decommission on the ops-eqiad board.
RobH added subscribers: Cmjohnson, RobH.

Ok, these are ready for @Cmjohnson to do the SSD smartctl secure erase on these systems. As these are lease returns, they are high priority.

RobH renamed this task from decommission (lease return) labvirt101[01].eqiad.wmnet to decommission labvirt101[01].eqiad.wmnet (Dec 2018 lease return).Dec 3 2018, 9:18 PM
RobH mentioned this in Unknown Object (Task).Dec 3 2018, 11:48 PM
RobH added a parent task: Unknown Object (Task).Dec 3 2018, 11:58 PM

Could you please refresh netbox status for these 2 servers:

The should probably be marked something else other than 'ACTIVE'.

I marked these as 'offline' which is not totally accurate but the closest thing I could find. Is it safe to assume these have long since been packed up and shipped off?

RobH added a comment.Mar 26 2019, 6:43 PM

Please note these systems still need their SSDs securely erased per https://wikitech.wikimedia.org/wiki/Dc-operations/Securely_Erasing_Media

Cmjohnson closed this task as Resolved.Apr 24 2019, 3:02 PM
Cmjohnson updated the task description. (Show Details)

xAll the disk were securely wiped and server reset to server defaults