Page MenuHomePhabricator

Decommission neodymium
Closed, ResolvedPublic

Description

neodymium has been replaced by cumin1001. Before this host can be removed, the MySQL grants need to be removed:
https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/466833/

  • - all system services confirmed offline from production use
  • - set all icinga checks to maint mode/disabled while reclaim/decommmission takes place.
  • - remove system from all lvs/pybal active configuration
  • - any service group puppet/heira/dsh config removed
  • - remove site.pp (replace with role(spare::system) if system isn't shut down immediately during this process.)

START NON-INTERRUPPTABLE STEPS

  • - disable puppet on host
  • - power down host
  • - disable switch port
  • - switch port assignment noted on this task (for later removal)
  • - remove all remaining puppet references (include role::spare)
  • - remove production dns entries
  • - puppet node clean, puppet node deactivate
  • - remove dbmonitor entries on neodymium/sarin: sudo curl -X DELETE https://debmonitor.discovery.wmnet/hosts/${HOST_FQDN} --cert /etc/debmonitor/ssl/cert.pem --key /etc/debmonitor/ssl/server.key

END NON-INTERRUPPTABLE STEPS

  • - system disks wiped (by onsite)
  • - IF DECOM: system unracked and decommissioned (by onsite), update racktables with result
  • - IF DECOM: switch port configration removed from switch once system is unracked.
  • - IF DECOM: add system to decommission tracking google sheet
  • - IF DECOM: mgmt dns entries removed.

Event Timeline

Restricted Application added a project: Operations. · View Herald TranscriptApr 9 2019, 1:44 PM
MoritzMuehlenhoff triaged this task as Medium priority.Apr 9 2019, 1:47 PM
Cmjohnson moved this task from Backlog to Decommission on the ops-eqiad board.Apr 16 2019, 6:24 PM
jbond added a subscriber: jbond.Jun 6 2019, 12:43 PM

im going to reimage this server to test the following change
https://gerrit.wikimedia.org/r/c/operations/puppet/+/514689

Script wmf-auto-reimage was launched by jbond on cumin1001.eqiad.wmnet for hosts:

['neodymium.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201906061246_jbond_84635.log.

Completed auto-reimage of hosts:

['neodymium.eqiad.wmnet']

Of which those FAILED:

['neodymium.eqiad.wmnet']

Script wmf-auto-reimage was launched by jbond on cumin1001.eqiad.wmnet for hosts:

['neodymium.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201906061400_jbond_99543.log.

Completed auto-reimage of hosts:

['neodymium.eqiad.wmnet']

Of which those FAILED:

['neodymium.eqiad.wmnet']

Script wmf-auto-reimage was launched by jbond on cumin1001.eqiad.wmnet for hosts:

['neodymium.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201906071310_jbond_114233.log.

Completed auto-reimage of hosts:

['neodymium.eqiad.wmnet']

and were ALL successful.

RobH assigned this task to MoritzMuehlenhoff.EditedJul 26 2019, 3:28 PM
RobH added a subscriber: RobH.

@MoritzMuehlenhoff,

Have the grants for this system been removed so we can move forward with decommission? Directed this to you since you created this task, and I assume you are a service owner?

Please comment and if its ready to start the decom process, check off the boxes and assign to me for followup. Thanks in advance!

Change 527043 had a related patch set uploaded (by Jbond; owner: John Bond):
[operations/puppet@production] mysql: remove grants for sarin and neodymium

https://gerrit.wikimedia.org/r/527043

Please comment and if its ready to start the decom process, check off the boxes and assign to me for followup. Thanks in advance!

This needs to wait until https://phabricator.wikimedia.org/T229796 is complete, I'll reassign the bug to you when that's done.

Change 534591 had a related patch set uploaded (by Muehlenhoff; owner: Muehlenhoff):
[operations/puppet@production] Remove neodymium/sarin from MySQL root clients

https://gerrit.wikimedia.org/r/534591

Change 534591 merged by Muehlenhoff:
[operations/puppet@production] Remove neodymium/sarin from MySQL root clients

https://gerrit.wikimedia.org/r/534591

Change 534600 had a related patch set uploaded (by Muehlenhoff; owner: Muehlenhoff):
[operations/puppet@production] Decommission neodymium

https://gerrit.wikimedia.org/r/534600

Change 534600 merged by Muehlenhoff:
[operations/puppet@production] Decommission neodymium

https://gerrit.wikimedia.org/r/534600

MoritzMuehlenhoff updated the task description. (Show Details)

This is ready for decom

cookbooks.sre.hosts.decommission executed by jmm@cumin1001 for hosts: neodymium.eqiad.wmnet

  • neodymium.eqiad.wmnet (PASS)
    • Downtimed host on Icinga
    • Downtimed management interface on Icinga
    • Wiped bootloaders
    • Powered off
    • Set Netbox status to Decommissioning
    • Removed from DebMonitor
    • Removed from Puppet master and PuppetDB

Change 538159 had a related patch set uploaded (by Muehlenhoff; owner: Muehlenhoff):
[operations/puppet@production] Remove site.pp entries for neodymium/sarin

https://gerrit.wikimedia.org/r/538159

Change 538159 merged by Muehlenhoff:
[operations/puppet@production] Remove site.pp entries for neodymium/sarin

https://gerrit.wikimedia.org/r/538159

Change 538160 had a related patch set uploaded (by Muehlenhoff; owner: Muehlenhoff):
[operations/dns@master] Remove DNS entries for neodymium/sarin

https://gerrit.wikimedia.org/r/538160

Change 538160 merged by Muehlenhoff:
[operations/dns@master] Remove DNS entries for neodymium/sarin

https://gerrit.wikimedia.org/r/538160

MoritzMuehlenhoff updated the task description. (Show Details)

Change 527043 abandoned by Jcrespo:
mysql: remove grants for sarin and neodymium

Reason:
Done somewhere else

https://gerrit.wikimedia.org/r/527043

RobH removed Cmjohnson as the assignee of this task.Sep 26 2019, 12:41 PM
RobH added a subscriber: Cmjohnson.
Jclark-ctr updated the task description. (Show Details)Feb 5 2020, 11:27 PM
RobH removed a subscriber: RobH.Mar 3 2020, 6:01 PM
Papaul added a subscriber: Papaul.Mar 19 2020, 6:04 PM
[edit interfaces interface-range vlan-private1-c-eqiad]
-    member ge-4/0/8;
[edit interfaces interface-range disabled]
     member ge-6/0/32 { ... }
+    member ge-4/0/8;
[edit interfaces]
-   ge-4/0/8 {
-       description neodymium;
-   }
Papaul updated the task description. (Show Details)Mar 19 2020, 6:05 PM
Papaul closed this task as Resolved.Mar 20 2020, 3:58 PM
Papaul updated the task description. (Show Details)

Complete

Dzahn added a subscriber: Dzahn.Mar 20 2020, 4:12 PM

This is in netbox as status"offline" but should be "decommissioning", right?