Page MenuHomePhabricator

Decomission mw1161-69
Closed, ResolvedPublic

Description

  • - all system services confirmed offline from production use
  • - set all icinga checks to maint mode/disabled while reclaim/decommmission takes place.
  • - remove system from all lvs/pybal active configuration
  • - any service group puppet/hiera/dsh config removed
  • - replace with role::spare::system

START NON-INTERRUPPTABLE STEPS

[ x] - disable puppet on host
[ x] - remove all remaining puppet references (include role::spare) https://gerrit.wikimedia.org/r/361581
[ x] - power down host
[ x] - disable switch port
[ x] - switch port assignment noted on this task (for later removal)
[x ] - remove production dns entries https://gerrit.wikimedia.org/r/361582
[ x] - puppet node clean, puppet node deactivate, salt key removed

END NON-INTERRUPPTABLE STEPS

[x ] - system disks wiped (by onsite)
[ x] - system unracked and decommissioned (by onsite), update racktables with result
[ x] - switch port configration removed from switch once system is unracked.
[ x] - mgmt dns entries removed.

Event Timeline

elukey created this task.Oct 4 2017, 12:18 PM

Current status for the jobrunners in eqiad:

elukey@neodymium:~$ sudo cumin '*.eqiad.wmnet and R:class = role::mediawiki::jobrunner' 'lldpcli show neighbors | grep SysName'
19 hosts will be targeted:
mw[1161-1167,1299-1306,1308-1311].eqiad.wmnet
Confirm to continue [y/n]? y
===== NODE GROUP =====
(7) mw[1161-1167].eqiad.wmnet
----- OUTPUT of 'lldpcli show nei...s | grep SysName' -----
    SysName:      asw-c-eqiad
===== NODE GROUP =====
(4) mw[1308-1311].eqiad.wmnet
----- OUTPUT of 'lldpcli show nei...s | grep SysName' -----
    SysName:      asw-a-eqiad
===== NODE GROUP =====
(8) mw[1299-1306].eqiad.wmnet
----- OUTPUT of 'lldpcli show nei...s | grep SysName' -----
    SysName:      asw-b-eqiad

Videoscalers:

elukey@neodymium:~$ sudo cumin '*.eqiad.wmnet and R:class = role::mediawiki::videoscaler' 'lldpcli show neighbors | grep SysName'
6 hosts will be targeted:
mw[1168-1169,1259-1260,1307,1318].eqiad.wmnet
Confirm to continue [y/n]? y
===== NODE GROUP =====
(2) mw[1168-1169].eqiad.wmnet
----- OUTPUT of 'lldpcli show nei...s | grep SysName' -----
    SysName:      asw-c-eqiad
===== NODE GROUP =====
(2) mw[1259,1307].eqiad.wmnet
----- OUTPUT of 'lldpcli show nei...s | grep SysName' -----
    SysName:      asw-a-eqiad
===== NODE GROUP =====
(2) mw[1260,1318].eqiad.wmnet
----- OUTPUT of 'lldpcli show nei...s | grep SysName' -----
    SysName:      asw-b-eqiad
elukey moved this task from Backlog to In Progress on the User-Elukey board.Oct 6 2017, 1:19 PM
Cmjohnson moved this task from Backlog to Up next on the ops-eqiad board.Oct 8 2017, 2:41 PM
elukey moved this task from In Progress to Stalled on the User-Elukey board.Oct 20 2017, 12:29 PM
Joe moved this task from Backlog to Doing on the User-Joe board.Oct 25 2017, 8:06 AM
Joe updated the task description. (Show Details)

I did all the steps in decom up to the uninterruptible tasks. @Cmjohnson the servers are yours to fully decom.

Joe moved this task from Doing to Blocked on others on the User-Joe board.Nov 2 2017, 11:43 AM

Change 391234 had a related patch set uploaded (by Cmjohnson; owner: Cmjohnson):
[operations/puppet@production] Removing site.pp and dhcp entries for decom host mw1161-69 T177387

https://gerrit.wikimedia.org/r/391234

@elukey or @Joe I went to finish the decom and found that 2 host still show up in puppet. please give me the okay to proceed.

modules/service/files/logstash_checker.py: epilog='Example: logstash_checker.py --host mw1167 --user "<user>" -p')

modules/profile/templates/cumin/aliases.yaml.erb:mw-videoscaler-canary: P{mw1168.eqiad.wmnet}

Cmjohnson updated the task description. (Show Details)Nov 14 2017, 4:55 PM

Change 391247 had a related patch set uploaded (by Cmjohnson; owner: Cmjohnson):
[operations/dns@master] Removing dns entries for decom hosts mw1161-69 T177387

https://gerrit.wikimedia.org/r/391247

Change 391234 merged by Cmjohnson:
[operations/puppet@production] Removing site.pp and dhcp entries for decom host mw1161-69 T177387

https://gerrit.wikimedia.org/r/391234

Change 391247 merged by Cmjohnson:
[operations/dns@master] Removing dns entries for decom hosts mw1161-69 T177387

https://gerrit.wikimedia.org/r/391247

Change 391522 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] cumin: update videoscaler canary after the decom of mw1168

https://gerrit.wikimedia.org/r/391522

Change 391522 merged by Elukey:
[operations/puppet@production] cumin: update videoscaler canary after the decom of mw1168

https://gerrit.wikimedia.org/r/391522

elukey moved this task from Stalled to Done on the User-Elukey board.Nov 17 2017, 10:24 AM
Cmjohnson closed this task as Resolved.Nov 21 2017, 2:53 PM
Cmjohnson claimed this task.
Cmjohnson updated the task description. (Show Details)
Cmjohnson updated the task description. (Show Details)