Page MenuHomePhabricator

Decommission iron
Closed, ResolvedPublic

Description

This is a priority since it's taking space in rack B4

iron was bought in January 2011, it's over eight years old! Before this host can be removed, two remaining use cases need to be moved (install access for WMCS and Yubikey bastion)

  • - all system services confirmed offline from production use
  • - set all icinga checks to maint mode/disabled while reclaim/decommmission takes place.
  • - remove system from all lvs/pybal active configuration
  • - any service group puppet/heira/dsh config removed
  • - remove site.pp (replace with role(spare::system) if system isn't shut down immediately during this process.)

START NON-INTERRUPPTABLE STEPS

  • - disable puppet on host
  • - power down host
  • - disable switch port
  • - switch port assignment noted on this task (for later removal) - asw2-b-eqiad:ge-4/0/8
  • - remove all remaining puppet references (include role::spare)
  • - remove production dns entries
  • - puppet node clean, puppet node deactivate
  • - remove dbmonitor entries on neodymium/sarin: sudo curl -X DELETE https://debmonitor.discovery.wmnet/hosts/${HOST_FQDN} --cert /etc/debmonitor/ssl/cert.pem --key /etc/debmonitor/ssl/server.key

END NON-INTERRUPPTABLE STEPS

  • - system disks wiped (by onsite)
  • - IF DECOM: system unracked and decommissioned (by onsite), update racktables with result
  • - IF DECOM: switch port configration removed from switch once system is unracked.
  • - IF DECOM: add system to decommission tracking google sheet
  • - IF DECOM: mgmt dns entries removed.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptApr 9 2019, 1:48 PM
MoritzMuehlenhoff triaged this task as Medium priority.Apr 9 2019, 1:49 PM
Restricted Application added a project: Operations. · View Herald TranscriptApr 10 2019, 10:55 PM
Cmjohnson moved this task from Backlog to Decommission on the ops-eqiad board.Apr 16 2019, 6:15 PM
Krenair added a subscriber: Krenair.Jun 3 2019, 6:47 PM

'install access for WMCS' struck me as odd so I asked around a bit:

<bstorm_> Iron has been used for cloudvirt installs in the past
<andrewbogott> Normally we access new unpuppetized servers from the puppetmasters.  They aren't allowed to access the cloudvirts though, due to a network rule I don't understand.  So we use iron instead.
<andrewbogott> As far as I know it's still the only way.
<bstorm_> Yup

modules/role/manifests/bastionhost/twofa.pp: include ::profile::access_new_install

other non-puppetmaster stuff with this profile: cumin1001, cumin2001 (via role::cluster::management and profile::spicerack). Don't know if the network rules would permit it from there either. Might be interesting to find out exactly what the problematic rule is and why iron is okay. hysterical raisins?

Cmjohnson raised the priority of this task from Medium to High.Aug 20 2019, 6:43 PM
Cmjohnson updated the task description. (Show Details)

'install access for WMCS' struck me as odd so I asked around a bit:

<bstorm_> Iron has been used for cloudvirt installs in the past
<andrewbogott> Normally we access new unpuppetized servers from the puppetmasters.  They aren't allowed to access the cloudvirts though, due to a network rule I don't understand.  So we use iron instead.
<andrewbogott> As far as I know it's still the only way.
<bstorm_> Yup

modules/role/manifests/bastionhost/twofa.pp: include ::profile::access_new_install

other non-puppetmaster stuff with this profile: cumin1001, cumin2001 (via role::cluster::management and profile::spicerack). Don't know if the network rules would permit it from there either. Might be interesting to find out exactly what the problematic rule is and why iron is okay. hysterical raisins?

iron really needs to be decommed at this point, it's extremely old and we need the space in the rack for new servers. I'm adding @ayounsi to CC. Arzhel, for the initial (pre-puppet run) SSH access we currently use install_console from the Cumin hosts or Puppet masters. Per @Andrew 's comment from above this doesn't work to connect to e.g. cloudvirt. Is there any router ACL which grants SSH access from iron.wikimedia.org towards labs-hosts-b-eqiad1/labs-hosts-d-eqiad1 which isn't present for puppetmaster*/cumin* hosts? If there's such a rule we should carry it over to the ACLs for puppetmaster/cumin, as the eventual goal is to remove iron fully.

Change 531867 had a related patch set uploaded (by Muehlenhoff; owner: Muehlenhoff):
[operations/puppet@production] Decom iron

https://gerrit.wikimedia.org/r/531867

Change 531867 merged by Muehlenhoff:
[operations/puppet@production] Decom iron

https://gerrit.wikimedia.org/r/531867

MoritzMuehlenhoff updated the task description. (Show Details)
MoritzMuehlenhoff added a subscriber: RobH.

Claiming the task for another test of the updated decom cookbook.

cookbooks.sre.hosts.decommission executed by jmm@cumin1001 for hosts: iron.wikimedia.org

  • iron.wikimedia.org (PASS)
    • Downtimed host on Icinga
    • Downtimed management interface on Icinga
    • Wiped bootloaders
    • Powered off
    • Set Netbox status to Decommissioning
    • Removed from DebMonitor
    • Removed from Puppet master and PuppetDB

Change 538046 had a related patch set uploaded (by Muehlenhoff; owner: Muehlenhoff):
[operations/puppet@production] Remove remaining Puppet references for iron

https://gerrit.wikimedia.org/r/538046

Change 538046 merged by Muehlenhoff:
[operations/puppet@production] Remove remaining Puppet references for iron

https://gerrit.wikimedia.org/r/538046

Change 538049 had a related patch set uploaded (by Muehlenhoff; owner: Muehlenhoff):
[operations/dns@master] Remove DNS entry for iron

https://gerrit.wikimedia.org/r/538049

Change 538049 merged by Muehlenhoff:
[operations/dns@master] Remove DNS entry for iron

https://gerrit.wikimedia.org/r/538049

MoritzMuehlenhoff updated the task description. (Show Details)

Back to Rob for switch port removal

RobH lowered the priority of this task from High to Medium.Sep 19 2019, 4:16 PM
RobH updated the task description. (Show Details)
RobH reassigned this task from RobH to Cmjohnson.Sep 19 2019, 4:19 PM

Ready for disk wipes and continued decom process.

decom system, dont return to spares.

Cmjohnson reassigned this task from Cmjohnson to Jclark-ctr.Sep 19 2019, 8:58 PM
Cmjohnson added a subscriber: Cmjohnson.

John, please wipe the servers, remove from the rack, update netbox and the tracking sheet. Assign back to me once you finish so I can kill the switch ports.

Jclark-ctr updated the task description. (Show Details)Oct 11 2019, 10:48 PM
Papaul added a subscriber: Papaul.Oct 12 2019, 12:09 AM
apaul@asw2-b-eqiad# show | compare 
[edit interfaces]
-   ge-4/0/8 {
-       description iron;
-   }
Papaul updated the task description. (Show Details)Oct 12 2019, 12:10 AM

Change 542635 had a related patch set uploaded (by Papaul; owner: Papaul):
[operations/dns@master] DNS: Remove mgmt DNS for iron

https://gerrit.wikimedia.org/r/542635

Change 542635 merged by Papaul:
[operations/dns@master] DNS: Remove mgmt DNS for iron

https://gerrit.wikimedia.org/r/542635

Papaul closed this task as Resolved.Oct 12 2019, 12:22 AM
Papaul updated the task description. (Show Details)

Complete

Peachey88 updated the task description. (Show Details)Oct 12 2019, 10:23 AM