Page MenuHomePhabricator

Decommission cp4011, cp4012, cp4019, cp4020
Closed, ResolvedPublic

Description

cp4011, cp4012, cp4019, cp4020 are ready for decom. They're switched to role::spare::system in puppet and have been freshly reinstalled in that role (no leftover services possible). These do need secure erase of drives to avoid leaking key material.

  • - all system services confirmed offline from production use
  • - set all icinga checks to maint mode/disabled while reclaim/decommmission takes place.
  • - remove system from all lvs/pybal active configuration
  • - any service group puppet/hiera/dsh config removed
  • - remove site.pp (replace with role::spare if system isn't shut down immediately during this process.)

START NON-INTERRUPPTABLE STEPS

  • - disable puppet on host
  • - remove all remaining puppet references (include role::spare)
  • - power down host
  • - disable switch port
  • - switch port assignment noted on this task (for later removal)
  • - remove production dns entries
  • - puppet node clean, puppet node deactivate, salt key removed

END NON-INTERRUPPTABLE STEPS

  • - system disks wiped (by onsite)

The remainder cannot happen until we are done with ALL the old CP systems to unrack them in a batch.

They can swap spots int eh racks with the new cp21+ systems.

  • - system unracked and decommissioned (by onsite), update racktables with result
  • - switch port configration removed from switch once system is unracked.
  • - mgmt dns entries removed.

Event Timeline

RobH updated the task description. (Show Details)

Port assignments for later update:

xe-1/0/10 cp4019
xe-1/0/11 cp4020
xe-2/0/11 cp4011
xe-2/0/12 cp4012

Ok, one of those port assignments is bad, since disabling them all brought down cp4010.

I'll need to go onsite to determine what ports these actually plug into. It seems the port labeling in ulsfo is NOT accurate. (It was moved years ago between floors by folks who typically do not do our DC ops, so mistakes happen!)

Change 361702 had a related patch set uploaded (by RobH; owner: RobH):
[operations/puppet@production] decom of cp4011, cp4012, cp4019, cp4020

https://gerrit.wikimedia.org/r/361702

Ok, I had to audit and fix all the switch ports for cp systems in ulsfo. They all have ALL been audited, and the new port description has cpname.ulsfo.wmnet just to show it was recently audited. (When all the ports are accounted for, I'll change them to the standard naming to exclude the fqdn.)

The actual ports for the decommissioned servers are:

xe-1/0/9 cp4019.ulsfo.wmnet
xe-1/0/11 cp4020.ulsfo.wmnet
xe-2/0/8 cp4012.ulsfo.wmnet
xe-2/0/9 cp4011.ulsfo.wmnet

I planned 2 commits on the network stack, but had to go with three.

  1. Audited and updated ALL cp systems port descriptions, they were ALL incorrect. Patch ONLY included description changes.
  2. second correction to add in lvs4004, which wasn't labeled and was quite evident after the first patch was live
  3. third commit disables the ports on the above 4 systems. unlike this AM (when the port descriptions were wrong) there were no unexected systems going offline due to network disabling.

So these 4 are now ready for disk wipes, which I shall start on immediately.

cp4020 securely wiped using hdparm off a usb boot stick of finnix (debian live lacked hdparm utilities.)

cp4019 used hdparm to securely erase ssds

Change 361702 merged by RobH:
[operations/puppet@production] decom of cp4011, cp4012, cp4019, cp4020

https://gerrit.wikimedia.org/r/361702

cp4011 and cp4012 securely erased

RobH changed the task status from Open to Stalled.Jun 27 2017, 10:01 PM
RobH removed a project: Patch-For-Review.
RobH updated the task description. (Show Details)

All of these systems have now been wiped and moved around in the racks in ulsfo. racktables shows their current position, but since wipe and movement in the rack, they dont have power/network/mgmt connections any longer.

RobH lowered the priority of this task from Medium to Low.Aug 25 2017, 5:56 PM
RobH removed RobH as the assignee of this task.Dec 14 2017, 7:28 PM
RobH added a subscriber: RobH.
RobH claimed this task.

I'm resolving this, as all the systems have been decommissioned and added to the decommissoined server tracking google sheet. They are still in the rack until wehave a pickup of decom systems for resale later this fiscal.

Change 478176 had a related patch set uploaded (by BBlack; owner: BBlack):
[operations/dns@master] Remove various dead cp4005-20 DNS entries

https://gerrit.wikimedia.org/r/478176

Change 478176 merged by BBlack:
[operations/dns@master] Remove various dead cp4005-20 DNS entries

https://gerrit.wikimedia.org/r/478176

Change 493094 had a related patch set uploaded (by BBlack; owner: BBlack):
[operations/dns@master] Remove dead ulsfo cp servers

https://gerrit.wikimedia.org/r/493094

Change 493094 merged by BBlack:
[operations/dns@master] Remove dead ulsfo cp servers

https://gerrit.wikimedia.org/r/493094