Page MenuHomePhabricator

Decomission ms-fe2001-4
Closed, ResolvedPublic

Description

These machines are old and need decomission, new machines were put in service in T152612: codfw: rack/setup ms-fe200[5-8]

ms-fe2001

  • - all system services confirmed offline from production use
  • - set all icinga checks to maint mode/disabled while reclaim/decommmission takes place.
  • - remove system from all lvs/pybal active configuration
  • - any service group puppet/heira/dsh config removed
  • - remove site.pp (replace with role::spare if system isn't shut down immediately during this process.)

START NON-INTERRUPPTABLE STEPS

END NON-INTERRUPPTABLE STEPS

  • - system disks wiped (by onsite)
  • - system unracked and decommissioned (by onsite), update racktables with result
  • - mgmt dns entries removed.
  • - switch port configration removed from switch once system is unracked

ms-fe2002

  • - all system services confirmed offline from production use
  • - set all icinga checks to maint mode/disabled while reclaim/decommmission takes place.
  • - remove system from all lvs/pybal active configuration
  • - any service group puppet/heira/dsh config removed
  • - remove site.pp (replace with role::spare if system isn't shut down immediately during this process.)

START NON-INTERRUPPTABLE STEPS

END NON-INTERRUPPTABLE STEPS

  • - system disks wiped (by onsite)
  • - system unracked and decommissioned (by onsite), update racktables with result
  • - mgmt dns entries removed.
  • - switch port configration removed from switch once system is unracked

ms-fe2003

  • - all system services confirmed offline from production use
  • - set all icinga checks to maint mode/disabled while reclaim/decommmission takes place.
  • - remove system from all lvs/pybal active configuration
  • - any service group puppet/heira/dsh config removed
  • - remove site.pp (replace with role::spare if system isn't shut down immediately during this process.)

START NON-INTERRUPPTABLE STEPS

END NON-INTERRUPPTABLE STEPS

  • - system disks wiped (by onsite)
  • - system unracked and decommissioned (by onsite), update racktables with result
  • - mgmt dns entries removed.
  • - switch port configration removed from switch once system is unracked

ms-fe2004

  • - all system services confirmed offline from production use
  • - set all icinga checks to maint mode/disabled while reclaim/decommmission takes place.
  • - remove system from all lvs/pybal active configuration
  • - any service group puppet/heira/dsh config removed
  • - remove site.pp (replace with role::spare if system isn't shut down immediately during this process.)

START NON-INTERRUPPTABLE STEPS

END NON-INTERRUPPTABLE STEPS

  • - system disks wiped (by onsite)
  • - system unracked and decommissioned (by onsite), update racktables with result
  • - mgmt dns entries removed.
  • - switch port configration removed from switch once system is unracked

Details

Related Gerrit Patches:
operations/dns : masterDNS/Decom Remove mgmt DNS entries for ms-fe200[1-4]
operations/dns : masterdecom of ms-fe200[1-4]
operations/puppet : productiondecom ms-fe2001 through ms-fe2004
operations/puppet : productionDecomission ms-fe200[1-4]

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMar 2 2017, 8:44 AM

Change 340694 had a related patch set uploaded (by Filippo Giunchedi):
[operations/puppet] Decomission ms-fe200[1-4]

https://gerrit.wikimedia.org/r/340694

Change 340694 merged by Filippo Giunchedi:
[operations/puppet] Decomission ms-fe200[1-4]

https://gerrit.wikimedia.org/r/340694

fgiunchedi reassigned this task from fgiunchedi to Papaul.Mar 2 2017, 9:35 AM
fgiunchedi added a project: hardware-requests.

All four machines ms-fe200[1-4] are running spare::system role and can be decomissioned.

Papaul added a comment.Mar 2 2017, 4:08 PM

@fgiunchedi are all the previous steps done? (removed from puppet. icinga...) ? I can just start from removing systems from DNS and begin the disk wipe process. Please add on what steps are complete and what left to be done.

Thanks.

RobH updated the task description. (Show Details)Mar 2 2017, 4:24 PM
RobH added a subscriber: RobH.

I've updated the base task description with the checklist for decommissioning (that is listed off the Server Lifecycle Wikitech Page as a sub-page here. This checklist should be used for all system reclaims/decoms so we ensure we don't miss any steps.

fgiunchedi updated the task description. (Show Details)Mar 2 2017, 4:40 PM

@Papaul I've effectively decomissioned the systems from production and they are still in puppet but running the spare role in puppet, I've marked as done the steps @RobH outlined.

Papaul triaged this task as Medium priority.Mar 6 2017, 3:54 PM
Papaul added a comment.Mar 7 2017, 3:35 PM

@fgiunchedi thank you. However, there are some steps that I will not be able to perform such as

  • disable puppet on host
  • remove all remaining puppet references (include role::spare)

once those steps are complete, this task can be assign to someone with network switch access to disable the different ports and i can then take over this tasks.

Thanks

RobH added a comment.Mar 7 2017, 4:45 PM

@Papaul: You can actually remove the puppet references, but you won't be able to self merge. You up to doing that? If not, I'll handle that step for you just before you are ready to do the onsite steps, let me know!

Papaul added a comment.Mar 9 2017, 5:40 PM

@RobH yes please do. Thanks

RobH claimed this task.Mar 9 2017, 7:26 PM
RobH added a subscriber: Papaul.

Change 343331 had a related patch set uploaded (by RobH):
[operations/puppet] decom ms-fe2001 through ms-fe2004

https://gerrit.wikimedia.org/r/343331

RobH updated the task description. (Show Details)Mar 17 2017, 8:23 PM
RobH updated the task description. (Show Details)Mar 17 2017, 8:40 PM

Change 343331 merged by RobH:
[operations/puppet] decom ms-fe2001 through ms-fe2004

https://gerrit.wikimedia.org/r/343331

RobH updated the task description. (Show Details)Mar 17 2017, 8:48 PM

Change 343334 had a related patch set uploaded (by RobH):
[operations/dns] decom of ms-fe200[1-4]

https://gerrit.wikimedia.org/r/343334

RobH updated the task description. (Show Details)Mar 17 2017, 8:51 PM

Change 343334 merged by RobH:
[operations/dns] decom of ms-fe200[1-4]

https://gerrit.wikimedia.org/r/343334

RobH reassigned this task from RobH to Papaul.Mar 17 2017, 8:53 PM
RobH removed a project: Patch-For-Review.

Assigning to @Papaul for the remainder of the steps.

If these are HDDs, please wipe. If SSDs, we'll need to investigate using new SSD trim support wipe applications to try to invalidate our data on these SSDs.

If we cannot successfully wipe the SSDs (if these have them), then they will have to be removed for physical destruction.

disk wipe in progress.

Papaul updated the task description. (Show Details)Mar 21 2017, 4:51 PM

Change 344651 had a related patch set uploaded (by Papaul):
[operations/dns@master] DNS/Decom Remove mgmt DNS entries for ms-fe200[1-4]

https://gerrit.wikimedia.org/r/344651

Dzahn removed Papaul as the assignee of this task.Mar 29 2017, 1:07 AM
Dzahn added a subscriber: Dzahn.

@Papaul all wipes done and servers are shut down? If yea, please assign to Rob for switch ports. @RobH there is https://gerrit.wikimedia.org/r/#/c/344651/ open to merge for the mgmt DNS.

Dzahn assigned this task to Papaul.Mar 29 2017, 1:07 AM
Papaul reassigned this task from Papaul to RobH.Mar 29 2017, 3:09 PM
Papaul updated the task description. (Show Details)
Dzahn reassigned this task from RobH to Ayokura.Apr 4 2017, 9:37 PM
Dzahn reassigned this task from Ayokura to ayounsi.
Dzahn added a subscriber: Ayokura.
Dzahn removed a subscriber: Ayokura.
faidon reassigned this task from ayounsi to RobH.Apr 4 2017, 9:51 PM
faidon added a subscriber: ayounsi.
Dzahn removed a project: netops.
RobH closed this task as Resolved.May 10 2017, 11:30 PM
RobH removed RobH as the assignee of this task.
RobH removed a project: Patch-For-Review.
RobH updated the task description. (Show Details)

Change 344651 merged by Dzahn:
[operations/dns@master] DNS/Decom Remove mgmt DNS entries for ms-fe200[1-4]

https://gerrit.wikimedia.org/r/344651

Dzahn added a comment.May 24 2017, 5:22 PM

and now it's actually resolved