Page MenuHomePhabricator

Decommission subra/suhail
Closed, ResolvedPublic

Description

The pool counters in codfw are now provided by two Ganeti VMs (poolcounter200[12]), subra and suhail can be decomissioned or reclaimed to spares (they're OOW for nearly two years now).

subra:

  • - all system services confirmed offline from production use
  • - set all icinga checks to maint mode/disabled while reclaim/decommmission takes place.
  • - remove system from all lvs/pybal active configuration
  • - any service group puppet/hiera/dsh config removed
  • - remove site.pp (replace with role::spare if system isn't shut down immediately during this process.)

START NON-INTERRUPPTABLE STEPS

  • - disable puppet on host
  • - remove all remaining puppet references (include role::spare)
  • - power down host
  • - disable switch port
  • - switch port assignment noted on this task (for later removal) asw-a-codfw:ge-5/0/19
  • - remove production dns entries
  • - puppet node clean, puppet node deactivate, salt key removed

END NON-INTERRUPPTABLE STEPS

  • - system disks wiped (by onsite)
  • - system unracked and decommissioned (by onsite), update racktables with result
  • - switch port configration removed from switch once system is unracked.
  • - mgmt dns entries removed.

suhail:

  • - all system services confirmed offline from production use
  • - set all icinga checks to maint mode/disabled while reclaim/decommmission takes place.
  • - remove system from all lvs/pybal active configuration
  • - any service group puppet/heira/dsh config removed
  • - remove site.pp (replace with role::spare if system isn't shut down immediately during this process.)

START NON-INTERRUPPTABLE STEPS

  • - disable puppet on host
  • - remove all remaining puppet references (include role::spare)
  • - power down host
  • - disable switch port
  • - switch port assignment noted on this task (for later removal) asw-b-codfw:ge-5/0/11
  • - remove production dns entries
  • - puppet node clean, puppet node deactivate, salt key removed

END NON-INTERRUPPTABLE STEPS

  • - system disks wiped (by onsite)
  • - system unracked and decommissioned (by onsite), update racktables with result
  • - switch port configration removed from switch once system is unracked.
  • - mgmt dns entries removed.

Event Timeline

Dzahn triaged this task as Medium priority.

Change 363110 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] decom subra and suhail

https://gerrit.wikimedia.org/r/363110

RobH renamed this task from Reclaim/Decommission subra/suhail to Decommission subra/suhail.Jul 5 2017, 7:37 PM
RobH updated the task description. (Show Details)

Mentioned in SAL (#wikimedia-operations) [2017-07-05T22:16:32Z] <mutante> subra/suhail: disabling puppet, stopping poolcounterd, stopping other services, first step of decom, replaced by poolcounter200[12] (T169506)

Change 363110 merged by Dzahn:
[operations/puppet@production] decom subra and suhail

https://gerrit.wikimedia.org/r/363110

Mentioned in SAL (#wikimedia-operations) [2017-07-05T22:29:02Z] <mutante> subra/suhail: re-enabled puppet, now with role::spare, no more poolcounter, scheduled icinga downtimes for decom (T169506)

@RobH thanks for adding the template! I did all the check boxes up to "non-interruptible". I can't continue that part myself due to lack of switch access. Let me know when you have some time for the switch port part then we can do it together.

Dzahn subscribed.

Change 363648 had a related patch set uploaded (by RobH; owner: RobH):
[operations/dns@master] decommission of subra/suhail

https://gerrit.wikimedia.org/r/363648

Change 363649 had a related patch set uploaded (by RobH; owner: RobH):
[operations/puppet@production] decom of subra/suhail

https://gerrit.wikimedia.org/r/363649

Change 363648 merged by RobH:
[operations/dns@master] decommission of subra/suhail

https://gerrit.wikimedia.org/r/363648

Change 363649 merged by RobH:
[operations/puppet@production] decom of subra/suhail

https://gerrit.wikimedia.org/r/363649

RobH edited projects, added ops-codfw; removed Patch-For-Review.
RobH added a subscriber: Papaul.

@Papaul,

Please go ahead and wipe the disks and decom/unrack these systems. Thanks!

Change 367919 had a related patch set uploaded (by Papaul; owner: Papaul):
[operations/dns@master] DNS: Remove mgmt DNS entries for subra and suhail

https://gerrit.wikimedia.org/r/367919

Change 367919 merged by RobH:
[operations/dns@master] DNS: Remove mgmt DNS entries for subra and suhail

https://gerrit.wikimedia.org/r/367919

merging papaul's dns change and removing switch port config

RobH removed a project: Patch-For-Review.
RobH updated the task description. (Show Details)