Page MenuHomePhabricator

Decommission ms-be1001 - ms-be1012
Closed, ResolvedPublic

Description

This task will track the decommissioning of ms-be10001 through ms-be1012.

This checklist must be applied to every single host in the range:

  • - all system services confirmed offline from production use
  • - set all icinga checks to maint mode/disabled while reclaim/decommmission takes place.
  • - remove system from all lvs/pybal active configuration
  • - any service group puppet/hiera/dsh config removed
  • - remove site.pp (replace with role::spare if system isn't shut down immediately during this process.)

START NON-INTERRUPPTABLE STEPS

  • - disable puppet on host
  • - remove all remaining puppet references (include role::spare)
  • - power down host
  • - disable switch port
  • - switch port assignment noted on this task (for later removal)
asw-a-eqiad:
asw-a-eqiad:ge-1/0/6	ms-be1001
asw-a-eqiad:ge-1/0/7	ms-be1002
asw-a-eqiad:ge-8/0/11	ms-be1003
asw-a-eqiad:ge-8/0/12	ms-be1004
asw-a-eqiad:ge-1/0/8	ms-be1008
asw-a-eqiad:ge-8/0/2	ms-be1012

asw-c-eqiad:
asw-c-eqiad:ge-2/0/2	ms-be1005
asw-c-eqiad:ge-2/0/3	ms-be1006
asw-c-eqiad:ge-2/0/4	ms-be1007
asw-c-eqiad:ge-3/0/2	ms-be1009
asw-c-eqiad:ge-3/0/3	ms-be1010
asw-c-eqiad:ge-3/0/4	ms-be1011
  • - remove production dns entries
  • - puppet node clean, puppet node deactivate, salt key removed

END NON-INTERRUPPTABLE STEPS

  • - system disks wiped (by onsite)
  • - system unracked and decommissioned (by onsite), update racktables with result
  • - switch port configration removed from switch once system is unracked.
  • - mgmt dns entries removed.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMay 29 2017, 8:53 AM
fgiunchedi triaged this task as Normal priority.May 29 2017, 8:54 AM
fgiunchedi moved this task from Backlog to Doing on the User-fgiunchedi board.

Mentioned in SAL (#wikimedia-operations) [2017-05-31T17:30:47Z] <godog> swift eqiad-prod decom ms-be100[128] - T166489

Mentioned in SAL (#wikimedia-operations) [2017-06-05T08:10:11Z] <godog> swift eqiad-prod decom ms-be1009 / 10 / 11 - T166489

Mentioned in SAL (#wikimedia-operations) [2017-06-08T08:58:23Z] <godog> swift eqiad-prod eqiad-prod: decom ms-be1005/6/7 - T166489

Mentioned in SAL (#wikimedia-operations) [2017-06-12T09:25:27Z] <godog> swift eqiad-prod finish decom ms-be1005/6/7 - T166489

Dzahn added a subscriber: Dzahn.Jun 16 2017, 7:16 PM

can they be shutdown at this point? ms-be1001 had a hardware fail today and i powercycled it before realizing they are already scheduled for decom. could i continue by shutting them down (#greenIT) and remove it from puppet/icinga etc?

@Dzahn almost, I'm running the last swift ring rebalance today. ETA is two/three days, I'll update/reassign this task once the machines are good to decom!

Mentioned in SAL (#wikimedia-operations) [2017-06-19T09:18:32Z] <godog> swift eqiad-prod: remove ms-be1001 - ms-be1012 - T166489

Dzahn awarded a token.Jun 20 2017, 6:47 PM

Change 360621 had a related patch set uploaded (by Filippo Giunchedi; owner: Filippo Giunchedi):
[operations/puppet@production] Move ms-be10[01-12] to spare systems for decom

https://gerrit.wikimedia.org/r/360621

Change 360621 merged by Filippo Giunchedi:
[operations/puppet@production] Move ms-be10[01-12] to spare systems for decom

https://gerrit.wikimedia.org/r/360621

fgiunchedi removed fgiunchedi as the assignee of this task.Jun 21 2017, 9:56 AM
fgiunchedi edited projects, added hardware-requests; removed Patch-For-Review.
fgiunchedi updated the task description. (Show Details)
fgiunchedi removed subscribers: gerritbot, Stashbot.
fgiunchedi added a subscriber: RobH.

@Dzahn machines are marked as spares now and good to be decom'd /cc @RobH

fgiunchedi moved this task from Doing to Blocked on the User-fgiunchedi board.Jun 21 2017, 9:57 AM
RobH claimed this task.Jun 21 2017, 3:24 PM
RobH moved this task from Backlog to Reclaim (Spares/Decommission) on the hardware-requests board.
RobH updated the task description. (Show Details)Jun 21 2017, 4:10 PM

Change 360665 had a related patch set uploaded (by RobH; owner: RobH):
[operations/puppet@production] decommission of ms-be1001 thorugh ms-be1012

https://gerrit.wikimedia.org/r/360665

Change 360665 merged by RobH:
[operations/puppet@production] decommission of ms-be1001 thorugh ms-be1012

https://gerrit.wikimedia.org/r/360665

RobH updated the task description. (Show Details)Jun 21 2017, 4:39 PM

Change 360671 had a related patch set uploaded (by RobH; owner: RobH):
[operations/dns@master] decommission of ms-be1001 through ms-be1012

https://gerrit.wikimedia.org/r/360671

Change 360671 merged by RobH:
[operations/dns@master] decommission of ms-be1001 through ms-be1012

https://gerrit.wikimedia.org/r/360671

RobH reassigned this task from RobH to Cmjohnson.Jun 21 2017, 4:52 PM
RobH edited projects, added ops-eqiad; removed Patch-For-Review.
RobH updated the task description. (Show Details)

Ok, these are shut down (switch ports disabled) with all required steps done. Next steps is for Chris to wipe the disks, then follow the remaining steps for these 12 systems.

RobH moved this task from Backlog to Not urgent on the ops-eqiad board.Jun 21 2017, 8:21 PM
fgiunchedi moved this task from Blocked to Radar on the User-fgiunchedi board.Jun 22 2017, 9:55 AM
Cmjohnson moved this task from Not urgent to Decommission on the ops-eqiad board.Jul 20 2017, 3:24 PM
Cmjohnson closed this task as Resolved.Oct 25 2017, 6:11 PM
Cmjohnson updated the task description. (Show Details)

resolved