Page MenuHomePhabricator

Decommission ms-be2001 - ms-be2012
Closed, ResolvedPublic

Description

These hosts belong to the first swift batch and we have newer hardware online now to be used as replacement.
Note the hdd/ssd could be likely used as spares if there's a shortage of those.

  • - all system services confirmed offline from production use
  • - set all icinga checks to maint mode/disabled while reclaim/decommmission takes place.
  • - remove system from all lvs/pybal active configuration
  • - any service group puppet/hiera/dsh config removed
  • - remove site.pp (replace with role::spare if system isn't shut down immediately during this process.)

START NON-INTERRUPPTABLE STEPS

  • - disable puppet on host
  • - remove all remaining puppet references (include role::spare)
  • - power down host
  • - disable switch port
  • - switch port assignment noted on this task (for later removal)
  • - remove production dns entries
  • - puppet node clean, puppet node deactivate, salt key removed

END NON-INTERRUPPTABLE STEPS

  • - system disks wiped (by onsite)
  • - system unracked and decommissioned (by onsite), update racktables with result
  • - switch port configration removed from switch once system is unracked. (can assign back to @RobH for this and last step)
  • - mgmt dns entries removed.

Details

Related Gerrit Patches:
operations/puppet : productionRemove stray swift backend servers from site.pp
operations/dns : masterDNS: Remove mgmt DNS entries for ms-be20[0-1[1-9]
operations/dns : masterdecommission of ms-be2001 through ms-be2012
operations/puppet : productionms-be2001 through ms-be2012 decom
operations/puppet : productiondecom ms-be2001 - ms-be2012

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptApr 12 2017, 9:49 AM

Mentioned in SAL (#wikimedia-operations) [2017-04-12T09:52:39Z] <godog> swift codfw-prod: ms-be2001 - ms-be2012 initial decom - T162785

fgiunchedi moved this task from Backlog to Doing on the User-fgiunchedi board.Apr 12 2017, 9:52 AM

Mentioned in SAL (#wikimedia-operations) [2017-05-05T08:46:09Z] <godog> swift codfw-prod: ms-be2001 - ms-be2012 weight 700 - T162785

Mentioned in SAL (#wikimedia-operations) [2017-05-08T09:17:00Z] <godog> swift codfw-prod: more ms-be2001/ms-be2012 decom - T162785

Mentioned in SAL (#wikimedia-operations) [2017-05-15T09:10:56Z] <godog> swift codfw-prod: more ms-be2001/ms-be2012 decom - T162785

Change 356017 had a related patch set uploaded (by Filippo Giunchedi; owner: Filippo Giunchedi):
[operations/puppet@production] decom ms-be2001 - ms-be2012

https://gerrit.wikimedia.org/r/356017

Change 356017 merged by Filippo Giunchedi:
[operations/puppet@production] decom ms-be2001 - ms-be2012

https://gerrit.wikimedia.org/r/356017

fgiunchedi renamed this task from Decomission ms-be2001 - ms-be2012 to Decommission ms-be2001 - ms-be2012.May 29 2017, 9:08 AM
fgiunchedi removed fgiunchedi as the assignee of this task.
fgiunchedi edited projects, added hardware-requests; removed Patch-For-Review, Operations.
fgiunchedi updated the task description. (Show Details)
Restricted Application added a project: Operations. · View Herald TranscriptMay 29 2017, 9:08 AM
fgiunchedi moved this task from Doing to Blocked on the User-fgiunchedi board.May 29 2017, 9:09 AM
Marostegui triaged this task as Medium priority.
Marostegui added a project: ops-codfw.

@Marostegui There are other steps that need to be done before this task can be assigned to me.

Ah, ok - sorry @Papaul - I thought you'd take over from the non interrupptable section as stated here: https://wikitech.wikimedia.org/wiki/Server_Lifecycle#Steps_for_DC-OPS_.28with_network_switch_access.29

This comment was removed by Marostegui.
RobH claimed this task.EditedJun 13 2017, 3:15 PM
RobH added subscribers: Papaul, RobH.

I can take this over from here to the onsite wipe stage.

Thamks for doing all the steps up to the non-interrupt folks!

Change 358615 had a related patch set uploaded (by RobH; owner: RobH):
[operations/puppet@production] ms-be2001 through ms-be2012 decom

https://gerrit.wikimedia.org/r/358615

Change 358615 merged by RobH:
[operations/puppet@production] ms-be2001 through ms-be2012 decom

https://gerrit.wikimedia.org/r/358615

Change 358618 had a related patch set uploaded (by RobH; owner: RobH):
[operations/dns@master] decommission of ms-be2001 through ms-be2012

https://gerrit.wikimedia.org/r/358618

Change 358618 merged by RobH:
[operations/dns@master] decommission of ms-be2001 through ms-be2012

https://gerrit.wikimedia.org/r/358618

RobH updated the task description. (Show Details)Jun 13 2017, 3:41 PM

switch port details:

asw-a-codfw:

ge-1/0/1 ms-be2001
ge-3/0/40 ms-be2002
ge-4/0/40 ms-be2003
ge-5/0/18 ms-be2004

asw-b-codfw:

ge-1/0/4 ms-be2005
ge-3/0/39 ms-be2006
ge-4/0/16 ms-be2007
ge-5/0/23 ms-be2008

asw-c-codfw:

ge-1/0/9 ms-be2009
ge-3/0/0 ms-be2010
ge-4/0/0 ms-be2011
ge-5/0/0 ms-be2012

RobH reassigned this task from RobH to Papaul.Jun 13 2017, 3:47 PM
RobH updated the task description. (Show Details)

Ok, this is now ready for all disks to be wiped, and then removed from the racks for decommission.

Disk wipe in progress

fgiunchedi moved this task from Blocked to Radar on the User-fgiunchedi board.Jun 22 2017, 9:55 AM

Disk wipe complete.

Papaul updated the task description. (Show Details)Jun 22 2017, 3:33 PM

switch port information
Row A
ms-be2001 - ge-1/0/1
ms-be2002 - ge-3/0/40
ms-be2003 - ge-4/0/40
ms-be2004 - ge-5/0/18

Row B
ms-be2005 - ge-1/0/4
ms-be2006 - ge-3/0/39
ms-be2007 - ge-4/0/16
ms-be2008 - ge-5/0/23

Row C
ms-be2009 - ge-1/0/9
ms-be2010 - ge-3/0/0
ms-be2011 - ge-4/0/0
ms-be2012 - ge-5/0/0

Papaul updated the task description. (Show Details)Jun 22 2017, 5:20 PM

Change 361682 had a related patch set uploaded (by Papaul; owner: Papaul):
[operations/dns@master] DNS: Remove mgmt DNS entries for ms-be20[0-1[1-9]

https://gerrit.wikimedia.org/r/361682

Change 361682 merged by Dzahn:
[operations/dns@master] DNS: Remove mgmt DNS entries for ms-be20[0-1[1-9]

https://gerrit.wikimedia.org/r/361682

Papaul updated the task description. (Show Details)Jun 28 2017, 2:36 PM

@RobH This is complete on my end. Thanks

Papaul reassigned this task from Papaul to RobH.Jun 28 2017, 2:38 PM
RobH moved this task from Backlog to Non-Urgent on the ops-codfw board.Jul 6 2017, 7:05 PM
RobH closed this task as Resolved.Jul 6 2017, 11:26 PM
RobH updated the task description. (Show Details)

removed all the descriptions from the disabled switch ports

Change 454250 had a related patch set uploaded (by Muehlenhoff; owner: Muehlenhoff):
[operations/puppet@production] Remove stray swift backend servers from site.pp

https://gerrit.wikimedia.org/r/454250

Change 454250 merged by Muehlenhoff:
[operations/puppet@production] Remove stray swift backend servers from site.pp

https://gerrit.wikimedia.org/r/454250