Page MenuHomePhabricator

decommission labvirt101[01].eqiad.wmnet (Dec 2018 lease return)
Closed, ResolvedPublic

Description

These servers are leased, so they need to be shut down and... returned?

labvirt1010:

  • - all system services confirmed offline from production use
  • - set all icinga checks to maint mode/disabled while reclaim/decommmission takes place.
  • - remove system from all lvs/pybal active configuration
  • - any service group puppet/hiera/dsh config removed
  • - remove site.pp (replace with role::spare::system if system isn't shut down immediately during this process.)

START NON-INTERRUPPTABLE STEPS

  • - disable puppet on host
  • - power down host
  • - disable switch port
  • - switch port assignment noted on this task (for later removal)
  • - remove all remaining puppet references (include role::spare)
  • - remove production dns entries
  • - puppet node clean, puppet node deactivate (handled by wmf-decommission-host)
  • - remove dbmonitor entries on neodymium/sarin: sudo curl -X DELETE https://debmonitor.discovery.wmnet/hosts/${HOST_FQDN} --cert /etc/debmonitor/ssl/cert.pem --key /etc/debmonitor/ssl/server.key (handled by wmf-decommission-host)

END NON-INTERRUPPTABLE STEPS

  • - system disks wiped (by onsite)
  • - system unracked and decommissioned (by onsite), update Netbox with result
  • - system set aside for lease return
  • - switch port configration removed from switch once system is unracked.
  • - mgmt dns entries removed.

labvirt1011:

  • - all system services confirmed offline from production use
  • - set all icinga checks to maint mode/disabled while reclaim/decommmission takes place.
  • - remove system from all lvs/pybal active configuration
  • - any service group puppet/hiera/dsh config removed
  • - remove site.pp (replace with role::spare::system if system isn't shut down immediately during this process.)

START NON-INTERRUPPTABLE STEPS

  • - disable puppet on host
  • - power down host
  • - disable switch port
  • - switch port assignment noted on this task (for later removal)
  • - remove all remaining puppet references (include role::spare)
  • - remove production dns entries
  • - puppet node clean, puppet node deactivate (handled by wmf-decommission-host)
  • - remove dbmonitor entries on neodymium/sarin: sudo curl -X DELETE https://debmonitor.discovery.wmnet/hosts/${HOST_FQDN} --cert /etc/debmonitor/ssl/cert.pem --key /etc/debmonitor/ssl/server.key (handled by wmf-decommission-host)

END NON-INTERRUPPTABLE STEPS

  • - system disks wiped (by onsite)
  • - system unracked and decommissioned (by onsite), update Netbox with result
  • - system set aside for lease return
  • - switch port configration removed from switch once system is unracked.
  • - mgmt dns entries removed.

Event Timeline

Change 476522 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] Move labvirt1010/1011 to role::spare

https://gerrit.wikimedia.org/r/476522

Change 476522 merged by Andrew Bogott:
[operations/puppet@production] Move labvirt1010/1011 to role::spare

https://gerrit.wikimedia.org/r/476522

Andrew updated the task description. (Show Details)
RobH renamed this task from Decom/return labvirt1010 and 1011 to decommission (lease return) labvirt101[01].eqiad.wmnet.Nov 29 2018, 4:57 PM
RobH triaged this task as High priority.
RobH removed a project: Patch-For-Review.
RobH updated the task description. (Show Details)

Switch ports on asw2-b-eqiad:

robh@asw2-b-eqiad> show interfaces descriptions | grep labvirt1010 
ge-3/0/14       up    up   labvirt1010 eth0
ge-3/0/15       up    up   labvirt1010 eth1

{master:2}
robh@asw2-b-eqiad> show interfaces descriptions | grep labvirt1011    
ge-3/0/16       up    up   labvirt1011 eth0
ge-3/0/17       up    up   labvirt1011 eth1

wmf-decommission-host was executed by robh for labvirt1010.eqiad.wmnet and performed the following actions:

  • Revoked Puppet certificate
  • Removed from PuppetDB
  • Downtimed host on Icinga
  • Downtimed mgmt interface on Icinga
  • Removed from DebMonitor

wmf-decommission-host was executed by robh for labvirt1011.eqiad.wmnet and performed the following actions:

  • Revoked Puppet certificate
  • Removed from PuppetDB
  • Downtimed host on Icinga
  • Downtimed mgmt interface on Icinga
  • Removed from DebMonitor

Change 476570 had a related patch set uploaded (by RobH; owner: RobH):
[operations/puppet@production] removing references to decom servers labvirt101[01]

https://gerrit.wikimedia.org/r/476570

Change 476570 merged by RobH:
[operations/puppet@production] removing references to decom servers labvirt101[01]

https://gerrit.wikimedia.org/r/476570

RobH removed a project: Patch-For-Review.
RobH updated the task description. (Show Details)
RobH moved this task from Backlog to Decommission on the ops-eqiad board.
RobH added subscribers: Cmjohnson, RobH.

Ok, these are ready for @Cmjohnson to do the SSD smartctl secure erase on these systems. As these are lease returns, they are high priority.

RobH renamed this task from decommission (lease return) labvirt101[01].eqiad.wmnet to decommission labvirt101[01].eqiad.wmnet (Dec 2018 lease return).Dec 3 2018, 9:18 PM
RobH mentioned this in Unknown Object (Task).Dec 3 2018, 11:48 PM
RobH added a parent task: Unknown Object (Task).Dec 3 2018, 11:58 PM

Could you please refresh netbox status for these 2 servers:

The should probably be marked something else other than 'ACTIVE'.

I marked these as 'offline' which is not totally accurate but the closest thing I could find. Is it safe to assume these have long since been packed up and shipped off?

Cmjohnson updated the task description. (Show Details)

xAll the disk were securely wiped and server reset to server defaults