decom cp3011-22 (12 machines)
Open, NormalPublic

Description

cp3011-22 were software-decommed from their old cache roles in T125485. They're still booted and puppeting, using just include standard for puppet in site.pp. They need standard wipe and decom (or reclaim/spare). Note one exception: cp3011 is has been powered off for a long time already due to hardware issues in T92306.

  • - all system services confirmed offline from production use
  • - set all icinga checks to maint mode/disabled while reclaim/decommmission takes place.
  • - remove system from all lvs/pybal active configuration
  • - any service group puppet/heira/dsh config removed
  • - remove site.pp (replace with role::spare if system isn't shut down immediately during this process.)

START NON-INTERRUPPTABLE STEPS

  • - disable puppet on host
  • - remove all remaining puppet references (include role::spare)
  • - power down host
  • - disable switch port (all hosts)
  • - remove production dns entries
  • - puppet node clean, puppet node deactivate, salt key removed

END NON-INTERRUPPTABLE STEPS

  • - system disks wiped by onsite
  • - system unracked and decommissioned (by onsite), update racktables with result
  • - switch port configration removed from switch once unracked.
BBlack created this task.Mar 24 2016, 9:06 PM
Restricted Application added a project: Operations. · View Herald TranscriptMar 24 2016, 9:06 PM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
BBlack edited projects, added ops-esams; removed ops-eqiad.
fgiunchedi triaged this task as Normal priority.Apr 27 2016, 1:50 PM
Restricted Application added a subscriber: Southparkfan. · View Herald TranscriptApr 27 2016, 1:50 PM

Script wmf_auto_reimage was launched by bblack on neodymium.eqiad.wmnet for hosts:

['cp3012.esams.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201610271211_bblack_12972.log.

To save future humans trouble: these can't be rebooted to PXE in any easy way, and the disks are behind raid controllers preventing secure erase, too :P

Dzahn added a subscriber: Dzahn.Jan 25 2017, 12:19 AM

Can we remove these from puppet and shut them down? They are running idle since October.

cp3011, cp3014 for example show up in modules/torrus/tests/cdn.pp. Wondering what to replace them with in those torrus tests.

Change 334005 had a related patch set uploaded (by Dzahn):
site.pp, DHCP: remove cp3011-cp3022

https://gerrit.wikimedia.org/r/334005

Change 334015 had a related patch set uploaded (by Dzahn):
remove cp3011-cp3022 incl. mgmt

https://gerrit.wikimedia.org/r/334015

Change 334005 merged by Dzahn:
site.pp, DHCP: remove cp3011-cp3022

https://gerrit.wikimedia.org/r/334005

Mentioned in SAL (#wikimedia-operations) [2017-02-06T20:27:01Z] <mutante> cp3011 thru cp3022 - revoke puppet certs, puppet node deactivate (T130883)

Mentioned in SAL (#wikimedia-operations) [2017-02-06T20:49:41Z] <mutante> cp3011 thru cp3022 - shutdown / poweroff (T130883)

Change 334015 merged by Dzahn:
remove cp3011-cp3022, keep mgmt

https://gerrit.wikimedia.org/r/334015

Dzahn added a comment.Feb 6 2017, 10:21 PM

servers are now removed from puppet, salt and DNS (except mgmt) and have been shutdown.

physical decom at the dc can follow

cp3014, cp3020 and cp3022 are still shown in servermon: https://servermon.wikimedia.org/hosts/

Probably "puppet node deactivate" was missing for those.

Mentioned in SAL (#wikimedia-operations) [2017-02-09T15:23:57Z] <ema> shutdown cp3020 T130883

Mentioned in SAL (#wikimedia-operations) [2017-02-09T21:54:01Z] <mutante> cp3014,cp3020,cp3022 - puppet node deactivate - cp3020 delete salt key (T130883)

Hmm, cp3014,cp3020,cp3022 are still listed in https://servermon.wikimedia.org/hosts/, though. No idea why, let's wait for @akosiaris to return.

disabled the network ports for the powered off systems cp3011-3022
robh@csw2-esams# show | compare
[edit interfaces xe-5/0/0]
+ disable;
[edit interfaces xe-5/0/1]
+ disable;
[edit interfaces xe-5/0/2]
+ disable;
[edit interfaces xe-5/0/3]
+ disable;
[edit interfaces xe-5/0/4]
+ disable;
[edit interfaces xe-5/0/5]
+ disable;
[edit interfaces xe-5/0/6]
+ disable;
[edit interfaces xe-5/0/7]
+ disable;
[edit interfaces xe-5/0/8]
+ disable;
[edit interfaces xe-5/0/9]
+ disable;
[edit interfaces xe-5/0/10]
+ disable;
[edit interfaces xe-5/0/11]
+ disable;

{master:0}[edit]
robh@csw2-esams# commit comment T130883

Dzahn added a comment.Feb 13 2017, 6:50 PM

Hmm, cp3014,cp3020,cp3022 are still listed in https://servermon.wikimedia.org/hosts/, though. No idea why, let's wait for @akosiaris to return.

i had a typo in " sudo puppet node deeactivate cp30s{node}.esams.wmnet", i ran "puppet node clean" and proper "puppet node deactivate" on all of them again. , they are gone now.

RobH updated the task description. (Show Details)Feb 13 2017, 6:56 PM
faidon moved this task from Backlog to Decommission on the ops-esams board.Aug 29 2017, 3:05 PM
Dzahn removed a subscriber: Dzahn.Sep 6 2017, 5:36 PM