Page MenuHomePhabricator

Decommission ms-be1001 - ms-be1012
Closed, ResolvedPublic

Description

This task will track the decommissioning of ms-be10001 through ms-be1012.

This checklist must be applied to every single host in the range:

  • - all system services confirmed offline from production use
  • - set all icinga checks to maint mode/disabled while reclaim/decommmission takes place.
  • - remove system from all lvs/pybal active configuration
  • - any service group puppet/hiera/dsh config removed
  • - remove site.pp (replace with role::spare if system isn't shut down immediately during this process.)

START NON-INTERRUPPTABLE STEPS

  • - disable puppet on host
  • - remove all remaining puppet references (include role::spare)
  • - power down host
  • - disable switch port
  • - switch port assignment noted on this task (for later removal)
asw-a-eqiad:
asw-a-eqiad:ge-1/0/6	ms-be1001
asw-a-eqiad:ge-1/0/7	ms-be1002
asw-a-eqiad:ge-8/0/11	ms-be1003
asw-a-eqiad:ge-8/0/12	ms-be1004
asw-a-eqiad:ge-1/0/8	ms-be1008
asw-a-eqiad:ge-8/0/2	ms-be1012

asw-c-eqiad:
asw-c-eqiad:ge-2/0/2	ms-be1005
asw-c-eqiad:ge-2/0/3	ms-be1006
asw-c-eqiad:ge-2/0/4	ms-be1007
asw-c-eqiad:ge-3/0/2	ms-be1009
asw-c-eqiad:ge-3/0/3	ms-be1010
asw-c-eqiad:ge-3/0/4	ms-be1011
  • - remove production dns entries
  • - puppet node clean, puppet node deactivate, salt key removed

END NON-INTERRUPPTABLE STEPS

  • - system disks wiped (by onsite)
  • - system unracked and decommissioned (by onsite), update racktables with result
  • - switch port configration removed from switch once system is unracked.
  • - mgmt dns entries removed.

Event Timeline

fgiunchedi triaged this task as Medium priority.May 29 2017, 8:54 AM
fgiunchedi moved this task from Backlog to Doing on the User-fgiunchedi board.

Mentioned in SAL (#wikimedia-operations) [2017-05-31T17:30:47Z] <godog> swift eqiad-prod decom ms-be100[128] - T166489

Mentioned in SAL (#wikimedia-operations) [2017-06-05T08:10:11Z] <godog> swift eqiad-prod decom ms-be1009 / 10 / 11 - T166489

Mentioned in SAL (#wikimedia-operations) [2017-06-08T08:58:23Z] <godog> swift eqiad-prod eqiad-prod: decom ms-be1005/6/7 - T166489

Mentioned in SAL (#wikimedia-operations) [2017-06-12T09:25:27Z] <godog> swift eqiad-prod finish decom ms-be1005/6/7 - T166489

can they be shutdown at this point? ms-be1001 had a hardware fail today and i powercycled it before realizing they are already scheduled for decom. could i continue by shutting them down (#greenIT) and remove it from puppet/icinga etc?

@Dzahn almost, I'm running the last swift ring rebalance today. ETA is two/three days, I'll update/reassign this task once the machines are good to decom!

Mentioned in SAL (#wikimedia-operations) [2017-06-19T09:18:32Z] <godog> swift eqiad-prod: remove ms-be1001 - ms-be1012 - T166489

Change 360621 had a related patch set uploaded (by Filippo Giunchedi; owner: Filippo Giunchedi):
[operations/puppet@production] Move ms-be10[01-12] to spare systems for decom

https://gerrit.wikimedia.org/r/360621

Change 360621 merged by Filippo Giunchedi:
[operations/puppet@production] Move ms-be10[01-12] to spare systems for decom

https://gerrit.wikimedia.org/r/360621

fgiunchedi edited projects, added hardware-requests; removed Patch-For-Review.
fgiunchedi updated the task description. (Show Details)
fgiunchedi removed subscribers: gerritbot, Stashbot.
fgiunchedi added a subscriber: RobH.

@Dzahn machines are marked as spares now and good to be decom'd /cc @RobH

Change 360665 had a related patch set uploaded (by RobH; owner: RobH):
[operations/puppet@production] decommission of ms-be1001 thorugh ms-be1012

https://gerrit.wikimedia.org/r/360665

Change 360665 merged by RobH:
[operations/puppet@production] decommission of ms-be1001 thorugh ms-be1012

https://gerrit.wikimedia.org/r/360665

Change 360671 had a related patch set uploaded (by RobH; owner: RobH):
[operations/dns@master] decommission of ms-be1001 through ms-be1012

https://gerrit.wikimedia.org/r/360671

Change 360671 merged by RobH:
[operations/dns@master] decommission of ms-be1001 through ms-be1012

https://gerrit.wikimedia.org/r/360671

RobH edited projects, added ops-eqiad; removed Patch-For-Review.
RobH updated the task description. (Show Details)

Ok, these are shut down (switch ports disabled) with all required steps done. Next steps is for Chris to wipe the disks, then follow the remaining steps for these 12 systems.

Cmjohnson updated the task description. (Show Details)

resolved