Page MenuHomePhabricator

Decommission rdb2001, rdb2002
Closed, ResolvedPublic

Description

rdb2001:

  • - all system services confirmed offline from production use
  • - set all icinga checks to maint mode/disabled while reclaim/decommmission takes place.
  • - remove system from all lvs/pybal active configuration (N/A)
  • - any service group puppet/hiera/dsh config removed
  • - remove site.pp (replace with role(spare::system) if system isn't shut down immediately during this process.)

START NON-INTERRUPPTABLE STEPS

  • - disable puppet on host
  • - power down host
  • - disable switch port - port is NOT labeled on switch, so it could not be disabled. onsite will need to physically trace and disable the port directly
  • - switch port assignment noted on this task (for later removal)
  • - remove all remaining puppet references (include role::spare)
  • - remove production dns entries
  • - puppet node clean, puppet node deactivate
  • - remove dbmonitor entries on neodymium/sarin: sudo curl -X DELETE https://debmonitor.discovery.wmnet/hosts/${HOST_FQDN} --cert /etc/debmonitor/ssl/cert.pem --key /etc/debmonitor/ssl/server.key

END NON-INTERRUPPTABLE STEPS

  • - system disks wiped (by onsite)
  • - IF DECOM: system unracked and decommissioned (by onsite), update Netbox with result
  • - IF DECOM: switch port configration removed from switch once system is unracked.
  • - IF DECOM: add system to decommission tracking google sheet
  • - IF DECOM: mgmt dns entries removed.

rdb2002:

  • - all system services confirmed offline from production use
  • - set all icinga checks to maint mode/disabled while reclaim/decommmission takes place.
  • - remove system from all lvs/pybal active configuration (N/A)
  • - any service group puppet/hiera/dsh config removed
  • - remove site.pp (replace with role(spare::system) if system isn't shut down immediately during this process.)

START NON-INTERRUPPTABLE STEPS

  • - disable puppet on host
  • - power down host
  • - disable switch port - port is NOT labeled on switch, so it could not be disabled. onsite will need to physically trace and disable the port directly
  • - switch port assignment noted on this task (for later removal)
  • - remove all remaining puppet references (include role::spare)
  • - remove production dns entries
  • - puppet node clean, puppet node deactivate
  • - remove dbmonitor entries on neodymium/sarin: sudo curl -X DELETE https://debmonitor.discovery.wmnet/hosts/${HOST_FQDN} --cert /etc/debmonitor/ssl/cert.pem --key /etc/debmonitor/ssl/server.key

END NON-INTERRUPPTABLE STEPS

  • - system disks wiped (by onsite)
  • - IF DECOM: system unracked and decommissioned (by onsite), update Netbox with result
  • - IF DECOM: switch port configration removed from switch once system is unracked.
  • - IF DECOM: add system to decommission tracking google sheet
  • - IF DECOM: mgmt dns entries removed.

Event Timeline

jijiki created this task.Nov 13 2018, 10:09 PM
ArielGlenn triaged this task as Normal priority.Nov 14 2018, 8:28 AM
jijiki added a subscriber: Volans.Nov 14 2018, 6:34 PM
jijiki updated the task description. (Show Details)Nov 14 2018, 6:38 PM
jijiki updated the task description. (Show Details)
jijiki moved this task from Backlog/Radar to In Progress on the User-jijiki board.Nov 14 2018, 6:44 PM

"Reimage rdb2001, rdb2002 to stretch and change their role to spare::system"
https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/472714/

Script wmf-auto-reimage was launched by volans on cumin2001.codfw.wmnet for hosts:

rdb2001.codfw.wmnet

The log can be found in /var/log/wmf-auto-reimage/201811161636_volans_15261_rdb2001_codfw_wmnet.log.

Completed auto-reimage of hosts:

['rdb2001.codfw.wmnet']

and were ALL successful.

jijiki updated the task description. (Show Details)Nov 19 2018, 9:06 AM
jijiki removed a subscriber: Volans.
jijiki moved this task from In Progress to Misc on the User-jijiki board.Nov 19 2018, 9:10 AM

rdb2001 was used for a demo, thus it was re-imaged.

jijiki removed a subscriber: jijiki.Jan 3 2019, 8:44 AM

Change 482295 had a related patch set uploaded (by Effie Mouzeli; owner: Muehlenhoff):
[operations/puppet@production] Remove obsolete Hiera files

https://gerrit.wikimedia.org/r/482295

Change 482295 merged by Effie Mouzeli:
[operations/puppet@production] Remove obsolete Hiera files

https://gerrit.wikimedia.org/r/482295

RobH claimed this task.Mar 7 2019, 7:25 PM
RobH removed a project: Patch-For-Review.
RobH updated the task description. (Show Details)

wmf-decommission-host was executed by robh for rdb2001.codfw.wmnet and performed the following actions:

  • Revoked Puppet certificate
  • Removed from PuppetDB
  • Downtimed host on Icinga
  • Downtimed mgmt interface on Icinga
  • Removed from DebMonitor

wmf-decommission-host was executed by robh for rdb2002.codfw.wmnet and performed the following actions:

  • Revoked Puppet certificate
  • Removed from PuppetDB
  • Downtimed host on Icinga
  • Downtimed mgmt interface on Icinga
  • Removed from DebMonitor
RobH updated the task description. (Show Details)Mar 7 2019, 7:33 PM

Change 495026 had a related patch set uploaded (by RobH; owner: RobH):
[operations/dns@master] rdb200[12] prod dns decom

https://gerrit.wikimedia.org/r/495026

Change 495027 had a related patch set uploaded (by RobH; owner: RobH):
[operations/puppet@production] rdb200[12] decom

https://gerrit.wikimedia.org/r/495027

Change 495026 merged by RobH:
[operations/dns@master] rdb200[12] prod dns decom

https://gerrit.wikimedia.org/r/495026

Change 495027 merged by RobH:
[operations/puppet@production] rdb200[12] decom

https://gerrit.wikimedia.org/r/495027

RobH reassigned this task from RobH to Papaul.Mar 7 2019, 7:41 PM
RobH added a project: ops-codfw.
RobH moved this task from Backlog to Decommission on the ops-codfw board.
RobH added a subscriber: RobH.

The ports are labeled on the switch but wrong labels

papaul@asw-a-codfw> show interfaces descriptions | match "ge-5/0/[0-1]"   
ge-5/0/0        up    up   rbd2001
ge-5/0/1        up    up   rbd2002
papaul@asw-a-codfw> show interfaces descriptions | match "ge-5/0/[0-1]"    
ge-5/0/0        down  down DISABLED
ge-5/0/1        down  down DISABLED
Papaul updated the task description. (Show Details)Mar 13 2019, 4:46 PM
Papaul updated the task description. (Show Details)Mar 14 2019, 5:54 PM

@RobH Any reason why we have to put those servers in the spares tracking sheet if the warranty expired date is Feb. 26, 2018 ?

RobH reassigned this task from Papaul to faidon.Mar 14 2019, 6:04 PM
RobH added subscribers: faidon, Papaul.

We don't automatically throw away out of warranty systems, so it is really up to @faidon if we decommission and dispose of these two systems (which ended warranty Feb 2018) or dispose of (and not add to spares tracking.)

So, this needs @faidon's input!

Change 496504 had a related patch set uploaded (by Papaul; owner: Papaul):
[operations/dns@master] DNS: remove mgmt DNS name for rdb200[1-2]

https://gerrit.wikimedia.org/r/496504

Papaul updated the task description. (Show Details)Mar 14 2019, 6:13 PM

Change 496504 merged by Dzahn:
[operations/dns@master] DNS: remove mgmt DNS name for rdb200[1-2]

https://gerrit.wikimedia.org/r/496504

faidon reassigned this task from faidon to RobH.May 16 2019, 4:44 PM

I don't know why this needs my input? This sounds like a standard decom, unless I misunderstand it.

RobH added a comment.May 16 2019, 4:45 PM

I don't know why this needs my input? This sounds like a standard decom, unless I misunderstand it.

We normally auto decom 5+ years old systems and these are just over 4 so I just wanted your approval to decom rather than add to spares.

Sure, that sounds fine :)

Papaul renamed this task from Reclaim rdb2001, rdb2002 to Decommission rdb2001, rdb2002.May 16 2019, 4:50 PM
Papaul claimed this task.
Papaul updated the task description. (Show Details)May 16 2019, 4:56 PM
Papaul updated the task description. (Show Details)May 21 2019, 8:39 PM
Papaul updated the task description. (Show Details)May 28 2019, 11:07 PM

Change 513608 had a related patch set uploaded (by Papaul; owner: Papaul):
[operations/dns@master] DNS: Remove mgmt asset tag for rdb200[1-2]

https://gerrit.wikimedia.org/r/513608

Change 513608 merged by Marostegui:
[operations/dns@master] DNS: Remove mgmt asset tag for rdb200[1-2]

https://gerrit.wikimedia.org/r/513608

Papaul closed this task as Resolved.Jun 5 2019, 2:34 PM

This is complete

Papaul updated the task description. (Show Details)Jun 5 2019, 2:34 PM