Page MenuHomePhabricator

Decommission sarin
Closed, ResolvedPublic

Description

sarin has been replaced by cumin2001. Before this host can be removed, the MySQL grants need to be removed:
https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/466833/

  • - all system services confirmed offline from production use
  • - set all icinga checks to maint mode/disabled while reclaim/decommmission takes place.
  • - remove system from all lvs/pybal active configuration
  • - any service group puppet/heira/dsh config removed
  • - remove site.pp (replace with role(spare::system) if system isn't shut down immediately during this process.)

START NON-INTERRUPPTABLE STEPS

  • - disable puppet on host
  • - power down host

[x ] - disable switch port

  • - switch port assignment noted on this task (for later removal)
  • - remove all remaining puppet references (include role::spare)
  • - remove production dns entries
  • - puppet node clean, puppet node deactivate
  • - remove dbmonitor entries on neodymium/sarin: sudo curl -X DELETE https://debmonitor.discovery.wmnet/hosts/${HOST_FQDN} --cert /etc/debmonitor/ssl/cert.pem --key /etc/debmonitor/ssl/server.key

END NON-INTERRUPPTABLE STEPS

  • - system disks wiped (by onsite)
  • - IF DECOM: system unracked and decommissioned (by onsite), update racktables with result
  • - IF DECOM: switch port configration removed from switch once system is unracked.
  • - IF DECOM: add system to decommission tracking google sheet
  • - IF DECOM: mgmt dns entries removed.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptApr 9 2019, 1:45 PM
MoritzMuehlenhoff triaged this task as Medium priority.Apr 9 2019, 1:47 PM
Restricted Application added a project: Operations. · View Herald TranscriptApr 9 2019, 1:47 PM
Dzahn moved this task from Backlog to Decommission on the ops-codfw board.Apr 12 2019, 12:07 AM
jbond added a subscriber: jbond.Jun 6 2019, 6:56 PM

going to re-image this server to stretch, testing changes to late_command.sh

Script wmf-auto-reimage was launched by jbond on cumin1001.eqiad.wmnet for hosts:

['sarin.codfw.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201906061858_jbond_159535.log.

Completed auto-reimage of hosts:

['sarin.codfw.wmnet']

Of which those FAILED:

['sarin.codfw.wmnet']

Script wmf-auto-reimage was launched by jbond on cumin1001.eqiad.wmnet for hosts:

['sarin.codfw.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201906071003_jbond_74120.log.

Completed auto-reimage of hosts:

['sarin.codfw.wmnet']

Of which those FAILED:

['sarin.codfw.wmnet']
RobH added a subscriber: RobH.

@MoritzMuehlenhoff,

Have the grants for this system been removed so we can move forward with decommission? Directed this to you since you created this task, and I assume you are a service owner?

Please comment and if its ready to start the decom process, check off the boxes and assign to me for followup. Thanks in advance!

Change 527043 had a related patch set uploaded (by Jbond; owner: John Bond):
[operations/puppet@production] mysql: remove grants for sarin and neodymium

https://gerrit.wikimedia.org/r/527043

@RobH This needs to wait until https://phabricator.wikimedia.org/T229796 is complete, I'll reassign the bug to you when that's done.

Change 534598 had a related patch set uploaded (by Muehlenhoff; owner: Muehlenhoff):
[operations/puppet@production] Decomission sarin

https://gerrit.wikimedia.org/r/534598

Change 534598 merged by Muehlenhoff:
[operations/puppet@production] Decomission sarin

https://gerrit.wikimedia.org/r/534598

MoritzMuehlenhoff updated the task description. (Show Details)

This is ready to be decommissioned now.

cookbooks.sre.hosts.decommission executed by jmm@cumin1001 for hosts: sarin.codfw.wmnet

  • sarin.codfw.wmnet (PASS)
    • Downtimed host on Icinga
    • Downtimed management interface on Icinga
    • Wiped bootloaders
    • Powered off
    • Set Netbox status to Decommissioning
    • Removed from DebMonitor
    • Removed from Puppet master and PuppetDB

Change 538159 had a related patch set uploaded (by Muehlenhoff; owner: Muehlenhoff):
[operations/puppet@production] Remove site.pp entries for neodymium/sarin

https://gerrit.wikimedia.org/r/538159

Change 538159 merged by Muehlenhoff:
[operations/puppet@production] Remove site.pp entries for neodymium/sarin

https://gerrit.wikimedia.org/r/538159

Change 538160 had a related patch set uploaded (by Muehlenhoff; owner: Muehlenhoff):
[operations/dns@master] Remove DNS entries for neodymium/sarin

https://gerrit.wikimedia.org/r/538160

Change 538160 merged by Muehlenhoff:
[operations/dns@master] Remove DNS entries for neodymium/sarin

https://gerrit.wikimedia.org/r/538160

MoritzMuehlenhoff updated the task description. (Show Details)
papaul@asw-a-codfw# show | compare 
[edit interfaces interface-range vlan-private1-a-codfw]
-    member ge-5/0/16;
[edit interfaces interface-range disabled]
     member ge-6/0/15 { ... }
+    member ge-5/0/16;
[edit interfaces]
-   ge-5/0/16 {
-       description sarin;
-       enable;
-   }
Papaul updated the task description. (Show Details)Sep 25 2019, 6:40 PM

Change 527043 abandoned by Jcrespo:
mysql: remove grants for sarin and neodymium

Reason:
Done somewhere else

https://gerrit.wikimedia.org/r/527043

Papaul updated the task description. (Show Details)Oct 7 2019, 3:16 PM
Papaul updated the task description. (Show Details)Oct 7 2019, 8:06 PM

Change 541907 had a related patch set uploaded (by Papaul; owner: Papaul):
[operations/dns@master] DNS: Remove mgmt DNS for sarin,db2050 and db2055

https://gerrit.wikimedia.org/r/541907

Change 541907 merged by Papaul:
[operations/dns@master] DNS: Remove mgmt DNS for sarin,db2050 and db2055

https://gerrit.wikimedia.org/r/541907

Papaul closed this task as Resolved.Oct 9 2019, 8:20 PM
Papaul updated the task description. (Show Details)

complete

RobH removed a subscriber: RobH.Oct 9 2019, 9:32 PM