Page MenuHomePhabricator

Decommission subra/suhail
Closed, ResolvedPublic

Description

The pool counters in codfw are now provided by two Ganeti VMs (poolcounter200[12]), subra and suhail can be decomissioned or reclaimed to spares (they're OOW for nearly two years now).

subra:

  • - all system services confirmed offline from production use
  • - set all icinga checks to maint mode/disabled while reclaim/decommmission takes place.
  • - remove system from all lvs/pybal active configuration
  • - any service group puppet/hiera/dsh config removed
  • - remove site.pp (replace with role::spare if system isn't shut down immediately during this process.)

START NON-INTERRUPPTABLE STEPS

  • - disable puppet on host
  • - remove all remaining puppet references (include role::spare)
  • - power down host
  • - disable switch port
  • - switch port assignment noted on this task (for later removal) asw-a-codfw:ge-5/0/19
  • - remove production dns entries
  • - puppet node clean, puppet node deactivate, salt key removed

END NON-INTERRUPPTABLE STEPS

  • - system disks wiped (by onsite)
  • - system unracked and decommissioned (by onsite), update racktables with result
  • - switch port configration removed from switch once system is unracked.
  • - mgmt dns entries removed.

suhail:

  • - all system services confirmed offline from production use
  • - set all icinga checks to maint mode/disabled while reclaim/decommmission takes place.
  • - remove system from all lvs/pybal active configuration
  • - any service group puppet/heira/dsh config removed
  • - remove site.pp (replace with role::spare if system isn't shut down immediately during this process.)

START NON-INTERRUPPTABLE STEPS

  • - disable puppet on host
  • - remove all remaining puppet references (include role::spare)
  • - power down host
  • - disable switch port
  • - switch port assignment noted on this task (for later removal) asw-b-codfw:ge-5/0/11
  • - remove production dns entries
  • - puppet node clean, puppet node deactivate, salt key removed

END NON-INTERRUPPTABLE STEPS

  • - system disks wiped (by onsite)
  • - system unracked and decommissioned (by onsite), update racktables with result
  • - switch port configration removed from switch once system is unracked.
  • - mgmt dns entries removed.

Details

Related Gerrit Patches:
operations/dns : masterDNS: Remove mgmt DNS entries for subra and suhail
operations/puppet : productiondecom of subra/suhail
operations/dns : masterdecommission of subra/suhail
operations/puppet : productiondecom subra and suhail

Event Timeline

Dzahn claimed this task.Jul 4 2017, 2:47 AM
Dzahn triaged this task as Medium priority.

Change 363110 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] decom subra and suhail

https://gerrit.wikimedia.org/r/363110

RobH updated the task description. (Show Details)Jul 5 2017, 7:34 PM
RobH moved this task from Backlog to Reclaim (Spares/Decommission) on the hardware-requests board.
RobH renamed this task from Reclaim/Decommission subra/suhail to Decommission subra/suhail.Jul 5 2017, 7:37 PM
RobH updated the task description. (Show Details)

Mentioned in SAL (#wikimedia-operations) [2017-07-05T22:16:32Z] <mutante> subra/suhail: disabling puppet, stopping poolcounterd, stopping other services, first step of decom, replaced by poolcounter200[12] (T169506)

Dzahn updated the task description. (Show Details)Jul 5 2017, 10:21 PM

Change 363110 merged by Dzahn:
[operations/puppet@production] decom subra and suhail

https://gerrit.wikimedia.org/r/363110

Mentioned in SAL (#wikimedia-operations) [2017-07-05T22:29:02Z] <mutante> subra/suhail: re-enabled puppet, now with role::spare, no more poolcounter, scheduled icinga downtimes for decom (T169506)

Dzahn updated the task description. (Show Details)Jul 5 2017, 10:29 PM
Dzahn added a comment.EditedJul 5 2017, 10:33 PM

@RobH thanks for adding the template! I did all the check boxes up to "non-interruptible". I can't continue that part myself due to lack of switch access. Let me know when you have some time for the switch port part then we can do it together.

Dzahn added a subscriber: RobH.Jul 5 2017, 10:33 PM
Dzahn reassigned this task from Dzahn to RobH.Jul 5 2017, 10:45 PM
Dzahn added a subscriber: Dzahn.
RobH updated the task description. (Show Details)Jul 6 2017, 6:38 PM

Change 363648 had a related patch set uploaded (by RobH; owner: RobH):
[operations/dns@master] decommission of subra/suhail

https://gerrit.wikimedia.org/r/363648

Change 363649 had a related patch set uploaded (by RobH; owner: RobH):
[operations/puppet@production] decom of subra/suhail

https://gerrit.wikimedia.org/r/363649

Change 363648 merged by RobH:
[operations/dns@master] decommission of subra/suhail

https://gerrit.wikimedia.org/r/363648

RobH updated the task description. (Show Details)Jul 6 2017, 6:50 PM

Change 363649 merged by RobH:
[operations/puppet@production] decom of subra/suhail

https://gerrit.wikimedia.org/r/363649

RobH reassigned this task from RobH to Papaul.Jul 6 2017, 6:53 PM
RobH edited projects, added ops-codfw; removed Patch-For-Review.
RobH added a subscriber: Papaul.

@Papaul,

Please go ahead and wipe the disks and decom/unrack these systems. Thanks!

RobH moved this task from Backlog to Non-Urgent on the ops-codfw board.Jul 6 2017, 6:54 PM
Papaul updated the task description. (Show Details)Jul 26 2017, 4:25 PM
Papaul updated the task description. (Show Details)Jul 26 2017, 5:10 PM

Change 367919 had a related patch set uploaded (by Papaul; owner: Papaul):
[operations/dns@master] DNS: Remove mgmt DNS entries for subra and suhail

https://gerrit.wikimedia.org/r/367919

Papaul updated the task description. (Show Details)Jul 26 2017, 5:17 PM

Change 367919 merged by RobH:
[operations/dns@master] DNS: Remove mgmt DNS entries for subra and suhail

https://gerrit.wikimedia.org/r/367919

RobH claimed this task.Jul 26 2017, 5:28 PM

merging papaul's dns change and removing switch port config

RobH closed this task as Resolved.Jul 26 2017, 5:35 PM
RobH removed a project: Patch-For-Review.
RobH updated the task description. (Show Details)