Page MenuHomePhabricator

Decommission restbase-test200[123]
Closed, ResolvedPublic

Description

See the parent task (for Services to confirm offline) for step 1.

These three hosts were purchased back in 2013-01-12, and are over 5 years old. So they will be decommissioned, not reclaimed to spares.

restbase-test2001:

  • - all system services confirmed offline from production use
  • - set all icinga checks to maint mode/disabled while reclaim/decommmission takes place.
  • - remove system from all lvs/pybal active configuration
  • - any service group puppet/heira/dsh config removed
  • - remove site.pp (replace with role(spare::system) if system isn't shut down immediately during this process.)

START NON-INTERRUPPTABLE STEPS

  • - disable puppet on host
  • - power down host
  • - disable switch port
  • - switch port assignment noted on this task (for later removal) asw-b-codfw:ge-5/0/19: restbase-test2001
  • - remove all remaining puppet references (include role::spare)
  • - remove production dns entries
  • - puppet node clean, puppet node deactivate

END NON-INTERRUPPTABLE STEPS

  • - system disks wiped (by onsite) - please note as ssds these need the smartctl utility wipe, NOT a normal HDD wipe.
  • - system unracked and decommissioned (by onsite), update racktables with result
  • - switch port configration removed from switch once system is unracked.
  • - add system to decommission tracking google sheet
  • - mgmt dns entries removed.

restbase-test2002:

  • - all system services confirmed offline from production use
  • - set all icinga checks to maint mode/disabled while reclaim/decommmission takes place.
  • - remove system from all lvs/pybal active configuration
  • - any service group puppet/heira/dsh config removed
  • - remove site.pp (replace with role(spare::system) if system isn't shut down immediately during this process.)

START NON-INTERRUPPTABLE STEPS

  • - disable puppet on host
  • - power down host
  • - disable switch port
  • - switch port assignment noted on this task (for later removal) asw-b-codfw:ge-5/0/18: restbase-test2002
  • - remove all remaining puppet references (include role::spare)
  • - remove production dns entries
  • - puppet node clean, puppet node deactivate

END NON-INTERRUPPTABLE STEPS

  • - system disks wiped (by onsite) - please note as ssds these need the smartctl utility wipe, NOT a normal HDD wipe.
  • - system unracked and decommissioned (by onsite), update racktables with result
  • - switch port configration removed from switch once system is unracked.
  • - add system to decommission tracking google sheet
  • - mgmt dns entries removed.

restbase-test2003:

  • - all system services confirmed offline from production use
  • - set all icinga checks to maint mode/disabled while reclaim/decommmission takes place.
  • - remove system from all lvs/pybal active configuration
  • - any service group puppet/heira/dsh config removed
  • - remove site.pp (replace with role(spare::system) if system isn't shut down immediately during this process.)

START NON-INTERRUPPTABLE STEPS

  • - disable puppet on host
  • - power down host
  • - disable switch port
  • - switch port assignment noted on this task (for later removal) asw-b-codfw:ge-5/0/19: restbase-test2001
  • - remove all remaining puppet references (include role::spare)
  • - remove production dns entries
  • - puppet node clean, puppet node deactivate

END NON-INTERRUPPTABLE STEPS

  • - system disks wiped (by onsite) - please note as ssds these need the smartctl utility wipe, NOT a normal HDD wipe.
  • - system unracked and decommissioned (by onsite), update racktables with result
  • - switch port configration removed from switch once system is unracked.
  • - add system to decommission tracking google sheet
  • - mgmt dns entries removed.

Related Objects

Event Timeline

faidon created this task.

Confirmed in the Services team meeting today; These machines can be decommissioned at the earliest convenience!

RobH subscribed.

Please ensure all decom requests are tagged with #hw-requests.

RobH updated the task description. (Show Details)

Change 419312 had a related patch set uploaded (by RobH; owner: RobH):
[operations/puppet@production] decom restbase-test200[123]

https://gerrit.wikimedia.org/r/419312

Change 419312 merged by RobH:
[operations/puppet@production] decom restbase-test200[123]

https://gerrit.wikimedia.org/r/419312

Change 419315 had a related patch set uploaded (by RobH; owner: RobH):
[operations/dns@master] restbase-test200* prod dns removal

https://gerrit.wikimedia.org/r/419315

Change 419315 merged by RobH:
[operations/dns@master] restbase-test200* prod dns removal

https://gerrit.wikimedia.org/r/419315

RobH removed a project: Patch-For-Review.
RobH updated the task description. (Show Details)

These are ready to be wiped by onsite. Please note as SSDs, these need the specific smartctl utility run to erase them securely. The HDD method of writing zeros will NOT work.

Papaul raised the priority of this task from Low to Medium.Mar 26 2018, 8:37 PM

switch port information
asw-b5-codfw
restbase-test2001 ge-5/0/19
restbase-test2002 ge-5/0/16
restbase-test2003 ge-5/0/20

Papaul subscribed.

@RobH can you please work on the switch ports and assign back to me for mgmt DNS removal.

Thanks.

RobH reassigned this task from RobH to Papaul.EditedMar 29 2018, 4:43 PM

@Papaul:

The switch port info you provided does not match the switch's configuration:

switch port information
asw-b5-codfw
restbase-test2001 ge-5/0/19
restbase-test2002 ge-5/0/16
restbase-test2003 ge-5/0/20

robh@asw-b-codfw> show interfaces descriptions | grep ge-5/0/19 
ge-5/0/19       down  down restbase-test2001

{master:2}
robh@asw-b-codfw> show interfaces descriptions | grep ge-5/0/16    
ge-5/0/16       down  down restbase-test2003

{master:2}
robh@asw-b-codfw> show interfaces descriptions | grep ge-5/0/20    
ge-5/0/20       up    up   labtestcontrol2001-eth0
ge-5/0/16       down  down restbase-test2003
ge-5/0/18       down  down restbase-test2002
ge-5/0/19       down  down restbase-test2001

Did you trace those out by hand and perhaps just have some mistakes? Can you re-check before I go disabling and removing ports please? It seems the switch is right, since the ports are down, and I'm guessing your update had the mistakes?

Please advise,

Switch port confirmation
asw-b5-codfw
restbase-test2001 ge-5/0/19
restbase-test2002 ge-5/0/16
restbase-test2003 ge-5/0/18

I've removed the descriptions from those switch ports.

@Papaul: moving forward, you should be aware that the order of operation of removing the switch port description can happen AFTER removing the mgmt dns entries. The main thing is that neither one of those should happen until after you have removed it from the rack.

Hope that clarifies!

Change 425842 had a related patch set uploaded (by Papaul; owner: Papaul):
[operations/dns@master] DNS: Remove DNS entries for restbase-test200[1-3]

https://gerrit.wikimedia.org/r/425842

Change 425842 merged by Dzahn:
[operations/dns@master] DNS: Remove DNS entries for restbase-test200[1-3]

https://gerrit.wikimedia.org/r/425842

Papaul updated the task description. (Show Details)
Papaul updated the task description. (Show Details)