Page MenuHomePhabricator

Decommission rdb1001, rdb1002, rdb1003, rdb1004, rdb1007, rdb1008
Closed, ResolvedPublic

Description

rdb1001:

  • - all system services confirmed offline from production use
  • - set all icinga checks to maint mode/disabled while reclaim/decommmission takes place.
  • - remove system from all lvs/pybal active configuration (N/A)
  • - any service group puppet/hiera/dsh config removed
  • - remove site.pp (replace with role(spare::system) if system isn't shut down immediately during this process.)

START NON-INTERRUPPTABLE STEPS

  • - disable puppet on host
  • - power down host
  • - disable switch port
  • - switch port assignment noted on this task (for later removal)
  • - remove all remaining puppet references (include role::spare)
  • - remove production dns entries
  • - puppet node clean, puppet node deactivate
  • - remove dbmonitor entries on neodymium/sarin: sudo curl -X DELETE https://debmonitor.discovery.wmnet/hosts/${HOST_FQDN} --cert /etc/debmonitor/ssl/cert.pem --key /etc/debmonitor/ssl/server.key

END NON-INTERRUPPTABLE STEPS

  • - system disks wiped (by onsite)
  • - system unracked and decommissioned (by onsite), update netbox with result
  • - switch port configration removed from switch once system is unracked.
  • - add system to decommission tracking google sheet
  • - mgmt dns entries removed.

rdb1002:

  • - all system services confirmed offline from production use
  • - set all icinga checks to maint mode/disabled while reclaim/decommmission takes place.
  • - remove system from all lvs/pybal active configuration (N/A)
  • - any service group puppet/hiera/dsh config removed
  • - remove site.pp (replace with role(spare::system) if system isn't shut down immediately during this process.)

START NON-INTERRUPPTABLE STEPS

  • - disable puppet on host
  • - power down host
  • - disable switch port
  • - switch port assignment noted on this task (for later removal)
  • - remove all remaining puppet references (include role::spare)
  • - remove production dns entries
  • - puppet node clean, puppet node deactivate
  • - remove dbmonitor entries on neodymium/sarin: sudo curl -X DELETE https://debmonitor.discovery.wmnet/hosts/${HOST_FQDN} --cert /etc/debmonitor/ssl/cert.pem --key /etc/debmonitor/ssl/server.key

END NON-INTERRUPPTABLE STEPS

  • - system disks wiped (by onsite)
  • - system unracked and decommissioned (by onsite), update netbox with result
  • - switch port configration removed from switch once system is unracked.
  • - add system to decommission tracking google sheet
  • - mgmt dns entries removed.

rdb1003:

  • - all system services confirmed offline from production use
  • - set all icinga checks to maint mode/disabled while reclaim/decommmission takes place.
  • - remove system from all lvs/pybal active configuration (N/A)
  • - any service group puppet/hiera/dsh config removed
  • - remove site.pp (replace with role(spare::system) if system isn't shut down immediately during this process.)

START NON-INTERRUPPTABLE STEPS

  • - disable puppet on host
  • - power down host
  • - disable switch port
  • - switch port assignment noted on this task (for later removal)
  • - remove all remaining puppet references (include role::spare)
  • - remove production dns entries
  • - puppet node clean, puppet node deactivate
  • - remove dbmonitor entries on neodymium/sarin: sudo curl -X DELETE https://debmonitor.discovery.wmnet/hosts/${HOST_FQDN} --cert /etc/debmonitor/ssl/cert.pem --key /etc/debmonitor/ssl/server.key

END NON-INTERRUPPTABLE STEPS

  • - system disks wiped (by onsite)
  • - system unracked and decommissioned (by onsite), update netbox with result
  • - switch port configration removed from switch once system is unracked.
  • - add system to decommission tracking google sheet
  • - mgmt dns entries removed.

rdb1004:

  • - all system services confirmed offline from production use
  • - set all icinga checks to maint mode/disabled while reclaim/decommmission takes place.
  • - remove system from all lvs/pybal active configuration (N/A)
  • - any service group puppet/hiera/dsh config removed
  • - remove site.pp (replace with role(spare::system) if system isn't shut down immediately during this process.)

START NON-INTERRUPPTABLE STEPS

  • - disable puppet on host
  • - power down host
  • - disable switch port
  • - switch port assignment noted on this task (for later removal)
  • - remove all remaining puppet references (include role::spare)
  • - remove production dns entries
  • - puppet node clean, puppet node deactivate
  • - remove dbmonitor entries on neodymium/sarin: sudo curl -X DELETE https://debmonitor.discovery.wmnet/hosts/${HOST_FQDN} --cert /etc/debmonitor/ssl/cert.pem --key /etc/debmonitor/ssl/server.key

END NON-INTERRUPPTABLE STEPS

  • - system disks wiped (by onsite)
  • - system unracked and decommissioned (by onsite), update netbox with result
  • - switch port configration removed from switch once system is unracked.
  • - add system to decommission tracking google sheet
  • - mgmt dns entries removed.

rdb1007:

  • - all system services confirmed offline from production use
  • - set all icinga checks to maint mode/disabled while reclaim/decommmission takes place.
  • - remove system from all lvs/pybal active configuration (N/A)
  • - any service group puppet/hiera/dsh config removed
  • - remove site.pp (replace with role(spare::system) if system isn't shut down immediately during this process.)

START NON-INTERRUPPTABLE STEPS

  • - disable puppet on host
  • - power down host
  • - disable switch port
  • - switch port assignment noted on this task (for later removal)
  • - remove all remaining puppet references (include role::spare)
  • - remove production dns entries
  • - puppet node clean, puppet node deactivate
  • - remove dbmonitor entries on neodymium/sarin: sudo curl -X DELETE https://debmonitor.discovery.wmnet/hosts/${HOST_FQDN} --cert /etc/debmonitor/ssl/cert.pem --key /etc/debmonitor/ssl/server.key

END NON-INTERRUPPTABLE STEPS

  • - system disks wiped (by onsite)
  • - system unracked and decommissioned (by onsite), update netbox with result
  • - switch port configration removed from switch once system is unracked.
  • - add system to decommission tracking google sheet
  • - mgmt dns entries removed.

rdb1008:

  • - all system services confirmed offline from production use
  • - set all icinga checks to maint mode/disabled while reclaim/decommmission takes place.
  • - remove system from all lvs/pybal active configuration (N/A)
  • - any service group puppet/hiera/dsh config removed
  • - remove site.pp (replace with role(spare::system) if system isn't shut down immediately during this process.)

START NON-INTERRUPPTABLE STEPS

  • - disable puppet on host
  • - power down host
  • - disable switch port
  • - switch port assignment noted on this task (for later removal)
  • - remove all remaining puppet references (include role::spare)
  • - remove production dns entries
  • - puppet node clean, puppet node deactivate
  • - remove dbmonitor entries on neodymium/sarin: sudo curl -X DELETE https://debmonitor.discovery.wmnet/hosts/${HOST_FQDN} --cert /etc/debmonitor/ssl/cert.pem --key /etc/debmonitor/ssl/server.key

END NON-INTERRUPPTABLE STEPS

  • - system disks wiped (by onsite)
  • - system unracked and decommissioned (by onsite), update netbox with result
  • - switch port configration removed from switch once system is unracked.
  • - add system to decommission tracking google sheet
  • - mgmt dns entries removed.
In T209181#5011725, > NETWORK PORT INFO

rdb1001:asw2-c-eqiad:ge-4/0/9
rdb1002:asw2-c-eqiad:ge-7/0/18
rdb1003:asw-a-eqiad:ge-4/0/43
rdb1004:asw2-b-eqiad:ge-4/0/43
rdb1007:asw2-c-eqiad:ge-4/0/3
rdb1008: asw2-c-eqiad:ge-5/0/30

Event Timeline

jijiki triaged this task as Medium priority.Nov 9 2018, 8:44 PM
jijiki created this task.

Change 472714 had a related patch set uploaded (by Effie Mouzeli; owner: Effie Mouzeli):
[operations/puppet@production] Reimage rdb2003/rdb2004, switch rdb100[123478] to spare::system

https://gerrit.wikimedia.org/r/472714

Change 472714 merged by Effie Mouzeli:
[operations/puppet@production] Reimage rdb2003/rdb2004, switch rdb100[123478] to spare::system

https://gerrit.wikimedia.org/r/472714

Change 482295 had a related patch set uploaded (by Effie Mouzeli; owner: Muehlenhoff):
[operations/puppet@production] Remove obsolete Hiera files

https://gerrit.wikimedia.org/r/482295

Change 482295 merged by Effie Mouzeli:
[operations/puppet@production] Remove obsolete Hiera files

https://gerrit.wikimedia.org/r/482295

Mentioned in SAL (#wikimedia-operations) [2019-03-08T17:30:00Z] <robh> decom in progress for rdb100[123478] via T209181

RobH updated the task description. (Show Details)
RobH moved this task from Backlog to Decommission on the ops-eqiad board.

NETWORK PORT INFO

rdb1001:asw2-c-eqiad:ge-4/0/9
rdb1002:asw2-c-eqiad:ge-7/0/18
rdb1003:asw-a-eqiad:ge-4/0/43
rdb1004:asw2-b-eqiad:ge-4/0/43
rdb1007:asw2-c-eqiad:ge-4/0/3
rdb1008: asw2-c-eqiad:ge-5/0/30

wmf-decommission-host was executed by robh for rdb1001.eqiad.wmnet and performed the following actions:

  • Revoked Puppet certificate
  • Removed from PuppetDB
  • Downtimed host on Icinga
  • Downtimed mgmt interface on Icinga
  • Removed from DebMonitor

wmf-decommission-host was executed by robh for rdb1002.eqiad.wmnet and performed the following actions:

  • Revoked Puppet certificate
  • Removed from PuppetDB
  • Downtimed host on Icinga
  • Downtimed mgmt interface on Icinga
  • Removed from DebMonitor

wmf-decommission-host was executed by robh for rdb1003.eqiad.wmnet and performed the following actions:

  • Revoked Puppet certificate
  • Removed from PuppetDB
  • Downtimed host on Icinga
  • Downtimed mgmt interface on Icinga
  • Removed from DebMonitor

wmf-decommission-host was executed by robh for rdb1004.eqiad.wmnet and performed the following actions:

  • Revoked Puppet certificate
  • Removed from PuppetDB
  • Downtimed host on Icinga
  • Downtimed mgmt interface on Icinga
  • Removed from DebMonitor

wmf-decommission-host was executed by robh for rdb1007.eqiad.wmnet and performed the following actions:

  • Revoked Puppet certificate
  • Removed from PuppetDB
  • Downtimed host on Icinga
  • Downtimed mgmt interface on Icinga
  • Removed from DebMonitor

wmf-decommission-host was executed by robh for rdb1008.eqiad.wmnet and performed the following actions:

  • Revoked Puppet certificate
  • Removed from PuppetDB
  • Downtimed host on Icinga
  • Downtimed mgmt interface on Icinga
  • Removed from DebMonitor

Change 495274 had a related patch set uploaded (by RobH; owner: RobH):
[operations/dns@master] decom rdb100[123478].eqiad.wmnet dns entries

https://gerrit.wikimedia.org/r/495274

Change 495274 merged by RobH:
[operations/dns@master] decom rdb100[123478].eqiad.wmnet dns entries

https://gerrit.wikimedia.org/r/495274

Change 495275 had a related patch set uploaded (by RobH; owner: RobH):
[operations/puppet@production] decom of rdb100[123478]

https://gerrit.wikimedia.org/r/495275

Change 495275 merged by RobH:
[operations/puppet@production] decom of rdb100[123478]

https://gerrit.wikimedia.org/r/495275

papaul@asw2-c-eqiad# show | compare 
[edit interfaces]
-   ge-5/0/30 {
-       description rdb1008;
-   }
papaul@asw2-c-eqiad# show | compare 
[edit interfaces]
-   ge-4/0/3 {
-       description rdb1007;
-   }
RobH edited subscribers, added: RobH; removed: Papaul.Apr 1 2020, 5:07 PM
RobH removed subscribers: RobH, gerritbot.

verified all servers are gone, they are on the list that was sold already.