Page MenuHomePhabricator

Decommission old memcached hosts - mc1001->mc1018
Closed, ResolvedPublic

Description

The old mc1001->mc1018 hosts need to be decommissioned (new nodes are already serving traffic).

  • - all system services confirmed offline from production use
  • - set all icinga checks to maint mode/disabled while reclaim/decommmission takes place.
  • - remove system from all lvs/pybal active configuration
  • - any service group puppet/heira/dsh config removed
  • - Set role::spare (system was not shut down)

START NON-INTERRUPPTABLE STEPS

  • - disable puppet on host
  • - remove all remaining puppet references (include role::spare)
  • - power down host
  • - disable switch port
  • - switch port assignment noted on this task (for later removal)
  • - remove production dns entries https://gerrit.wikimedia.org/r/#/c/346823/
  • - puppet node clean, puppet node deactivate, salt key removed

END NON-INTERRUPPTABLE STEPS
[x ] - system disks wiped (by onsite)

  • - system unracked and decommissioned (by onsite), update racktables with result
  • - switch port configration removed from switch once system is unracked.
  • - mgmt dns entries removed.

Event Timeline

Change 354453 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] Remove any reference of mc1001->mc1018 for decom

https://gerrit.wikimedia.org/r/354453

elukey added a subscriber: Cmjohnson.

@Cmjohnson: The hosts are ready for the non interruptible steps, including https://gerrit.wikimedia.org/r/354453, so I haven't merged it yet. Icinga alarms are off.

@elukey, @Joe, @Cmjohnson: for testing purposes of the migration of the reimage script from salt to cumin, could I grab mc100[1-2] in the next few days as test hosts for the reimage script?

They are already in spare role in puppet, but let me know if you see any reason it's best not to take those ones.

Mentioned in SAL (#wikimedia-operations) [2017-09-08T14:58:08Z] <volans> testing wmf-auto-reimage also on mc1002 T166300 T164341

@Cmjohnson FYI I'm not using anymore the above hosts for testing.

Change 354453 abandoned by Elukey:
Remove any reference of mc1001->mc1018 for decom

https://gerrit.wikimedia.org/r/354453

Change 397906 had a related patch set uploaded (by Cmjohnson; owner: Cmjohnson):
[operations/puppet@production] Removing site.pp and dhcpd file entries for mc1001-18 T164341

https://gerrit.wikimedia.org/r/397906

Change 397906 merged by Cmjohnson:
[operations/puppet@production] Removing site.pp and dhcpd file entries for mc1001-18 T164341

https://gerrit.wikimedia.org/r/397906

Change 398066 had a related patch set uploaded (by Cmjohnson; owner: Cmjohnson):
[operations/dns@master] removing dns entries for decom hosts mc1001-1018 T164341

https://gerrit.wikimedia.org/r/398066

Change 398066 merged by Cmjohnson:
[operations/dns@master] removing dns entries for decom hosts mc1001-1018 T164341

https://gerrit.wikimedia.org/r/398066

These are still showing up in https://servermon.wikimedia.org/hosts/, probably "puppet node deactivate" is missing

Cmjohnson updated the task description. (Show Details)

These all had ssds and they have been removed. the ssds will not be included with the servers