Page MenuHomePhabricator

Decommission rdb1001, rdb1002, rdb1003, rdb1004, rdb1007, rdb1008
Open, MediumPublic

Description

rdb1001:

  • - all system services confirmed offline from production use
  • - set all icinga checks to maint mode/disabled while reclaim/decommmission takes place.
  • - remove system from all lvs/pybal active configuration (N/A)
  • - any service group puppet/hiera/dsh config removed
  • - remove site.pp (replace with role(spare::system) if system isn't shut down immediately during this process.)

START NON-INTERRUPPTABLE STEPS

  • - disable puppet on host
  • - power down host
  • - disable switch port
  • - switch port assignment noted on this task (for later removal)
  • - remove all remaining puppet references (include role::spare)
  • - remove production dns entries
  • - puppet node clean, puppet node deactivate
  • - remove dbmonitor entries on neodymium/sarin: sudo curl -X DELETE https://debmonitor.discovery.wmnet/hosts/${HOST_FQDN} --cert /etc/debmonitor/ssl/cert.pem --key /etc/debmonitor/ssl/server.key

END NON-INTERRUPPTABLE STEPS

  • - system disks wiped (by onsite)
  • - system unracked and decommissioned (by onsite), update netbox with result
  • - switch port configration removed from switch once system is unracked.
  • - add system to decommission tracking google sheet
  • - mgmt dns entries removed.

rdb1002:

  • - all system services confirmed offline from production use
  • - set all icinga checks to maint mode/disabled while reclaim/decommmission takes place.
  • - remove system from all lvs/pybal active configuration (N/A)
  • - any service group puppet/hiera/dsh config removed
  • - remove site.pp (replace with role(spare::system) if system isn't shut down immediately during this process.)

START NON-INTERRUPPTABLE STEPS

  • - disable puppet on host
  • - power down host
  • - disable switch port
  • - switch port assignment noted on this task (for later removal)
  • - remove all remaining puppet references (include role::spare)
  • - remove production dns entries
  • - puppet node clean, puppet node deactivate
  • - remove dbmonitor entries on neodymium/sarin: sudo curl -X DELETE https://debmonitor.discovery.wmnet/hosts/${HOST_FQDN} --cert /etc/debmonitor/ssl/cert.pem --key /etc/debmonitor/ssl/server.key

END NON-INTERRUPPTABLE STEPS

  • - system disks wiped (by onsite)
  • - system unracked and decommissioned (by onsite), update netbox with result
  • - switch port configration removed from switch once system is unracked.
  • - add system to decommission tracking google sheet
  • - mgmt dns entries removed.

rdb1003:

  • - all system services confirmed offline from production use
  • - set all icinga checks to maint mode/disabled while reclaim/decommmission takes place.
  • - remove system from all lvs/pybal active configuration (N/A)
  • - any service group puppet/hiera/dsh config removed
  • - remove site.pp (replace with role(spare::system) if system isn't shut down immediately during this process.)

START NON-INTERRUPPTABLE STEPS

  • - disable puppet on host
  • - power down host
  • - disable switch port
  • - switch port assignment noted on this task (for later removal)
  • - remove all remaining puppet references (include role::spare)
  • - remove production dns entries
  • - puppet node clean, puppet node deactivate
  • - remove dbmonitor entries on neodymium/sarin: sudo curl -X DELETE https://debmonitor.discovery.wmnet/hosts/${HOST_FQDN} --cert /etc/debmonitor/ssl/cert.pem --key /etc/debmonitor/ssl/server.key

END NON-INTERRUPPTABLE STEPS

  • - system disks wiped (by onsite)
  • - system unracked and decommissioned (by onsite), update netbox with result
  • - switch port configration removed from switch once system is unracked.
  • - add system to decommission tracking google sheet
  • - mgmt dns entries removed.

rdb1004:

  • - all system services confirmed offline from production use
  • - set all icinga checks to maint mode/disabled while reclaim/decommmission takes place.
  • - remove system from all lvs/pybal active configuration (N/A)
  • - any service group puppet/hiera/dsh config removed
  • - remove site.pp (replace with role(spare::system) if system isn't shut down immediately during this process.)

START NON-INTERRUPPTABLE STEPS

  • - disable puppet on host
  • - power down host
  • - disable switch port
  • - switch port assignment noted on this task (for later removal)
  • - remove all remaining puppet references (include role::spare)
  • - remove production dns entries
  • - puppet node clean, puppet node deactivate
  • - remove dbmonitor entries on neodymium/sarin: sudo curl -X DELETE https://debmonitor.discovery.wmnet/hosts/${HOST_FQDN} --cert /etc/debmonitor/ssl/cert.pem --key /etc/debmonitor/ssl/server.key

END NON-INTERRUPPTABLE STEPS

  • - system disks wiped (by onsite)
  • - system unracked and decommissioned (by onsite), update netbox with result
  • - switch port configration removed from switch once system is unracked.
  • - add system to decommission tracking google sheet
  • - mgmt dns entries removed.

rdb1007:

  • - all system services confirmed offline from production use
  • - set all icinga checks to maint mode/disabled while reclaim/decommmission takes place.
  • - remove system from all lvs/pybal active configuration (N/A)
  • - any service group puppet/hiera/dsh config removed
  • - remove site.pp (replace with role(spare::system) if system isn't shut down immediately during this process.)

START NON-INTERRUPPTABLE STEPS

  • - disable puppet on host
  • - power down host
  • - disable switch port
  • - switch port assignment noted on this task (for later removal)
  • - remove all remaining puppet references (include role::spare)
  • - remove production dns entries
  • - puppet node clean, puppet node deactivate
  • - remove dbmonitor entries on neodymium/sarin: sudo curl -X DELETE https://debmonitor.discovery.wmnet/hosts/${HOST_FQDN} --cert /etc/debmonitor/ssl/cert.pem --key /etc/debmonitor/ssl/server.key

END NON-INTERRUPPTABLE STEPS

  • - system disks wiped (by onsite)
  • - system unracked and decommissioned (by onsite), update netbox with result
  • - switch port configration removed from switch once system is unracked.
  • - add system to decommission tracking google sheet
  • - mgmt dns entries removed.

rdb1008:

  • - all system services confirmed offline from production use
  • - set all icinga checks to maint mode/disabled while reclaim/decommmission takes place.
  • - remove system from all lvs/pybal active configuration (N/A)
  • - any service group puppet/hiera/dsh config removed
  • - remove site.pp (replace with role(spare::system) if system isn't shut down immediately during this process.)

START NON-INTERRUPPTABLE STEPS

  • - disable puppet on host
  • - power down host
  • - disable switch port
  • - switch port assignment noted on this task (for later removal)
  • - remove all remaining puppet references (include role::spare)
  • - remove production dns entries
  • - puppet node clean, puppet node deactivate
  • - remove dbmonitor entries on neodymium/sarin: sudo curl -X DELETE https://debmonitor.discovery.wmnet/hosts/${HOST_FQDN} --cert /etc/debmonitor/ssl/cert.pem --key /etc/debmonitor/ssl/server.key

END NON-INTERRUPPTABLE STEPS

  • - system disks wiped (by onsite)
  • - system unracked and decommissioned (by onsite), update netbox with result
  • - switch port configration removed from switch once system is unracked.
  • - add system to decommission tracking google sheet
  • - mgmt dns entries removed.

NETWORK PORT INFO
rdb1001:asw2-c-eqiad:ge-4/0/9
rdb1002:asw2-c-eqiad:ge-7/0/18
rdb1003:asw-a-eqiad:ge-4/0/43
rdb1004:asw2-b-eqiad:ge-4/0/43
rdb1007:asw2-c-eqiad:ge-4/0/3
rdb1008: asw2-c-eqiad:ge-5/0/30

Details

Related Gerrit Patches:
operations/puppet : productiondecom of rdb100[123478]
operations/dns : masterdecom rdb100[123478].eqiad.wmnet dns entries
operations/puppet : productionRemove obsolete Hiera files
operations/puppet : productionReimage rdb2003/rdb2004, switch rdb100[123478] to spare::system

Event Timeline

jijiki triaged this task as Medium priority.Nov 9 2018, 8:44 PM
jijiki created this task.
Restricted Application removed a project: Patch-For-Review. · View Herald TranscriptNov 9 2018, 8:44 PM

Change 472714 had a related patch set uploaded (by Effie Mouzeli; owner: Effie Mouzeli):
[operations/puppet@production] Reimage rdb2003/rdb2004, switch rdb100[123478] to spare::system

https://gerrit.wikimedia.org/r/472714

jijiki updated the task description. (Show Details)Nov 9 2018, 8:45 PM

Change 472714 merged by Effie Mouzeli:
[operations/puppet@production] Reimage rdb2003/rdb2004, switch rdb100[123478] to spare::system

https://gerrit.wikimedia.org/r/472714

RobH updated the task description. (Show Details)Nov 13 2018, 9:55 PM
jijiki updated the task description. (Show Details)Nov 14 2018, 6:40 PM
jijiki moved this task from In Progress to Misc on the User-jijiki board.Nov 19 2018, 9:10 AM
jijiki removed a subscriber: jijiki.Jan 3 2019, 8:44 AM

Change 482295 had a related patch set uploaded (by Effie Mouzeli; owner: Muehlenhoff):
[operations/puppet@production] Remove obsolete Hiera files

https://gerrit.wikimedia.org/r/482295

Change 482295 merged by Effie Mouzeli:
[operations/puppet@production] Remove obsolete Hiera files

https://gerrit.wikimedia.org/r/482295

jijiki updated the task description. (Show Details)Mar 8 2019, 5:26 PM

Mentioned in SAL (#wikimedia-operations) [2019-03-08T17:30:00Z] <robh> decom in progress for rdb100[123478] via T209181

RobH edited projects, added ops-eqiad; removed Patch-For-Review.Mar 8 2019, 5:47 PM
RobH updated the task description. (Show Details)
RobH moved this task from Backlog to Decommission on the ops-eqiad board.
RobH added a subscriber: RobH.Mar 8 2019, 6:03 PM

NETWORK PORT INFO

rdb1001:asw2-c-eqiad:ge-4/0/9
rdb1002:asw2-c-eqiad:ge-7/0/18
rdb1003:asw-a-eqiad:ge-4/0/43
rdb1004:asw2-b-eqiad:ge-4/0/43
rdb1007:asw2-c-eqiad:ge-4/0/3
rdb1008: asw2-c-eqiad:ge-5/0/30

wmf-decommission-host was executed by robh for rdb1001.eqiad.wmnet and performed the following actions:

  • Revoked Puppet certificate
  • Removed from PuppetDB
  • Downtimed host on Icinga
  • Downtimed mgmt interface on Icinga
  • Removed from DebMonitor

wmf-decommission-host was executed by robh for rdb1002.eqiad.wmnet and performed the following actions:

  • Revoked Puppet certificate
  • Removed from PuppetDB
  • Downtimed host on Icinga
  • Downtimed mgmt interface on Icinga
  • Removed from DebMonitor

wmf-decommission-host was executed by robh for rdb1003.eqiad.wmnet and performed the following actions:

  • Revoked Puppet certificate
  • Removed from PuppetDB
  • Downtimed host on Icinga
  • Downtimed mgmt interface on Icinga
  • Removed from DebMonitor

wmf-decommission-host was executed by robh for rdb1004.eqiad.wmnet and performed the following actions:

  • Revoked Puppet certificate
  • Removed from PuppetDB
  • Downtimed host on Icinga
  • Downtimed mgmt interface on Icinga
  • Removed from DebMonitor

wmf-decommission-host was executed by robh for rdb1007.eqiad.wmnet and performed the following actions:

  • Revoked Puppet certificate
  • Removed from PuppetDB
  • Downtimed host on Icinga
  • Downtimed mgmt interface on Icinga
  • Removed from DebMonitor

wmf-decommission-host was executed by robh for rdb1008.eqiad.wmnet and performed the following actions:

  • Revoked Puppet certificate
  • Removed from PuppetDB
  • Downtimed host on Icinga
  • Downtimed mgmt interface on Icinga
  • Removed from DebMonitor

Change 495274 had a related patch set uploaded (by RobH; owner: RobH):
[operations/dns@master] decom rdb100[123478].eqiad.wmnet dns entries

https://gerrit.wikimedia.org/r/495274

Change 495274 merged by RobH:
[operations/dns@master] decom rdb100[123478].eqiad.wmnet dns entries

https://gerrit.wikimedia.org/r/495274

Change 495275 had a related patch set uploaded (by RobH; owner: RobH):
[operations/puppet@production] decom of rdb100[123478]

https://gerrit.wikimedia.org/r/495275

RobH updated the task description. (Show Details)

Change 495275 merged by RobH:
[operations/puppet@production] decom of rdb100[123478]

https://gerrit.wikimedia.org/r/495275

Jclark-ctr updated the task description. (Show Details)Aug 1 2019, 3:34 PM
Jclark-ctr updated the task description. (Show Details)Aug 1 2019, 4:46 PM
Jclark-ctr updated the task description. (Show Details)Aug 21 2019, 5:58 PM
Jclark-ctr added a subscriber: Cmjohnson.
Jclark-ctr updated the task description. (Show Details)Oct 11 2019, 10:58 PM
Jclark-ctr updated the task description. (Show Details)Oct 11 2019, 11:00 PM
Jclark-ctr updated the task description. (Show Details)Thu, Nov 28, 12:06 AM