Page MenuHomePhabricator

decommission db2014,db2020, db2021, db2022, db2024, db2031
Closed, ResolvedPublicRequest

Description

This task will track the decommission-hardware of servers db2014,db2020, db2021, db2022, db2024, db2031.

The first 5 steps should be completed by the service owner that is returning the server to DC-ops (for reclaim to spare or decommissioning, dependent on server configuration and age.)

In investigation of the netbox reports on: https://netbox.wikimedia.org/extras/reports/puppetdb.PuppetDB/ I discovered these db systems were 'active' in netbox, but not showing up in icinga. That leads me (@RobH) to think these are just leftover cruft. So the first listings for service owners are really kind of done, since these are not online doing things or calling into icinga.

db2014:

  • - all system services confirmed offline from production use
  • - set all icinga checks to maint mode/disabled while reclaim/decommmission takes place.
  • - remove system from all lvs/pybal active configuration
  • - any service group puppet/hiera/dsh config removed
  • - remove site.pp, replace with role(spare::system)
  • - unassign service owner from this task, check off completed steps, and assign to @RobH for followup on below steps.
  • - disable puppet on host - system already offline
  • - power down host - system already offline
  • - update netbox status to Inventory (if decom) or Planned (if spare)
  • - disable switch port
  • - switch port assignment noted on this task (for later removal)
  • - remove all remaining puppet references (include role::spare)
  • - remove production dns entries
  • - puppet node clean, puppet node deactivate (handled by wmf-decommission-host) - already offline
  • - remove dbmonitor entries on neodymium/sarin: sudo curl -X DELETE https://debmonitor.discovery.wmnet/hosts/${HOST_FQDN} --cert /etc/debmonitor/ssl/cert.pem --key /etc/debmonitor/ssl/server.key (handled by wmf-decommission-host) - already offline
  • - system disks wiped (by onsite)
  • - IF DECOM: system unracked and decommissioned (by onsite), update netbox with result
  • - IF DECOM: switch port configration removed from switch once system is unracked.
  • - IF DECOM: add system to decommission tracking google sheet
  • - IF DECOM: mgmt dns entries removed.
  • - change netbox status to offline when unracked

db2020:

  • - all system services confirmed offline from production use
  • - set all icinga checks to maint mode/disabled while reclaim/decommmission takes place.
  • - remove system from all lvs/pybal active configuration
  • - any service group puppet/hiera/dsh config removed
  • - remove site.pp, replace with role(spare::system)
  • - unassign service owner from this task, check off completed steps, and assign to @RobH for followup on below steps.
  • - disable puppet on host - system already offline
  • - power down host - system already offline
  • - update netbox status to Inventory (if decom) or Planned (if spare)
  • - disable switch port
  • - switch port assignment noted on this task (for later removal)
  • - remove all remaining puppet references (include role::spare)
  • - remove production dns entries
  • - puppet node clean, puppet node deactivate (handled by wmf-decommission-host) - already offline
  • - remove dbmonitor entries on neodymium/sarin: sudo curl -X DELETE https://debmonitor.discovery.wmnet/hosts/${HOST_FQDN} --cert /etc/debmonitor/ssl/cert.pem --key /etc/debmonitor/ssl/server.key (handled by wmf-decommission-host) - already offline
  • - system disks wiped (by onsite)
  • - IF DECOM: system unracked and decommissioned (by onsite), update netbox with result
  • - IF DECOM: switch port configration removed from switch once system is unracked.
  • - IF DECOM: add system to decommission tracking google sheet
  • - IF DECOM: mgmt dns entries removed.
  • - change netbox status to offline when unracked

db2021:

  • - all system services confirmed offline from production use
  • - set all icinga checks to maint mode/disabled while reclaim/decommmission takes place.
  • - remove system from all lvs/pybal active configuration
  • - any service group puppet/hiera/dsh config removed
  • - remove site.pp, replace with role(spare::system)
  • - unassign service owner from this task, check off completed steps, and assign to @RobH for followup on below steps.
  • - disable puppet on host - system already offline
  • - power down host - system already offline
  • - update netbox status to Inventory (if decom) or Planned (if spare)
  • - disable switch port
  • - switch port assignment noted on this task (for later removal)
  • - remove all remaining puppet references (include role::spare)
  • - remove production dns entries
  • - puppet node clean, puppet node deactivate (handled by wmf-decommission-host) - already offline
  • - remove dbmonitor entries on neodymium/sarin: sudo curl -X DELETE https://debmonitor.discovery.wmnet/hosts/${HOST_FQDN} --cert /etc/debmonitor/ssl/cert.pem --key /etc/debmonitor/ssl/server.key (handled by wmf-decommission-host) - already offline
  • - system disks wiped (by onsite)
  • - IF DECOM: system unracked and decommissioned (by onsite), update netbox with result
  • - IF DECOM: switch port configration removed from switch once system is unracked.
  • - IF DECOM: add system to decommission tracking google sheet
  • - IF DECOM: mgmt dns entries removed.
  • - change netbox status to offline when unracked

db2022:

  • - all system services confirmed offline from production use
  • - set all icinga checks to maint mode/disabled while reclaim/decommmission takes place.
  • - remove system from all lvs/pybal active configuration
  • - any service group puppet/hiera/dsh config removed
  • - remove site.pp, replace with role(spare::system)
  • - unassign service owner from this task, check off completed steps, and assign to @RobH for followup on below steps.
  • - disable puppet on host - system already offline
  • - power down host - system already offline
  • - update netbox status to Inventory (if decom) or Planned (if spare)
  • - disable switch port
  • - switch port assignment noted on this task (for later removal)
  • - remove all remaining puppet references (include role::spare)
  • - remove production dns entries
  • - puppet node clean, puppet node deactivate (handled by wmf-decommission-host) - already offline
  • - remove dbmonitor entries on neodymium/sarin: sudo curl -X DELETE https://debmonitor.discovery.wmnet/hosts/${HOST_FQDN} --cert /etc/debmonitor/ssl/cert.pem --key /etc/debmonitor/ssl/server.key (handled by wmf-decommission-host) - already offline
  • - system disks wiped (by onsite)
  • - IF DECOM: system unracked and decommissioned (by onsite), update netbox with result
  • - IF DECOM: switch port configration removed from switch once system is unracked.
  • - IF DECOM: add system to decommission tracking google sheet
  • - IF DECOM: mgmt dns entries removed.
  • - change netbox status to offline when unracked

db2024:

  • - all system services confirmed offline from production use
  • - set all icinga checks to maint mode/disabled while reclaim/decommmission takes place.
  • - remove system from all lvs/pybal active configuration
  • - any service group puppet/hiera/dsh config removed
  • - remove site.pp, replace with role(spare::system)
  • - unassign service owner from this task, check off completed steps, and assign to @RobH for followup on below steps.
  • - disable puppet on host - system already offline
  • - power down host - system already offline
  • - update netbox status to Inventory (if decom) or Planned (if spare)
  • - disable switch port
  • - switch port assignment noted on this task (for later removal)
  • - remove all remaining puppet references (include role::spare)
  • - remove production dns entries
  • - puppet node clean, puppet node deactivate (handled by wmf-decommission-host) - already offline
  • - remove dbmonitor entries on neodymium/sarin: sudo curl -X DELETE https://debmonitor.discovery.wmnet/hosts/${HOST_FQDN} --cert /etc/debmonitor/ssl/cert.pem --key /etc/debmonitor/ssl/server.key (handled by wmf-decommission-host) - already offline
  • - system disks wiped (by onsite)
  • - IF DECOM: system unracked and decommissioned (by onsite), update Netbox with result
  • - IF DECOM: switch port configration removed from switch once system is unracked.
  • - IF DECOM: add system to decommission tracking google sheet
  • - IF DECOM: mgmt dns entries removed.
  • - change netbox status to offline when unracked

db2031:

  • - all system services confirmed offline from production use
  • - set all icinga checks to maint mode/disabled while reclaim/decommmission takes place.
  • - remove system from all lvs/pybal active configuration
  • - any service group puppet/hiera/dsh config removed
  • - remove site.pp, replace with role(spare::system)
  • - unassign service owner from this task, check off completed steps, and assign to @RobH for followup on below steps.
  • - disable puppet on host - system already offline
  • - power down host - system already offline
  • - update netbox status to Inventory (if decom) or Planned (if spare)
  • - disable switch port
  • - switch port assignment noted on this task (for later removal)
  • - remove all remaining puppet references (include role::spare)
  • - remove production dns entries
  • - puppet node clean, puppet node deactivate (handled by wmf-decommission-host) - already offline
  • - remove dbmonitor entries on neodymium/sarin: sudo curl -X DELETE https://debmonitor.discovery.wmnet/hosts/${HOST_FQDN} --cert /etc/debmonitor/ssl/cert.pem --key /etc/debmonitor/ssl/server.key (handled by wmf-decommission-host) - already offline
  • - system disks wiped (by onsite)
  • - IF DECOM: system unracked and decommissioned (by onsite), update Netbox with result
  • - IF DECOM: switch port configration removed from switch once system is unracked.
  • - IF DECOM: add system to decommission tracking google sheet
  • - IF DECOM: mgmt dns entries removed.
  • - change netbox status to offline when unracked

Event Timeline

Marostegui subscribed.

All those hosts were decommissioned as part of T176243, so probably a leftover from that.
Removing our tag as there is nothing for us to do. I will keep subscribed to this task in case our help is needed to clarify something.
Thanks!

papaul@asw-a-codfw> show interfaces ge-6/0/17 descriptions    
Interface       Admin Link Description
ge-6/0/17       up    down db2014

papaul@asw-b-codfw> show interfaces ge-6/0/4 descriptions 
Interface       Admin Link Description
ge-6/0/4        up    up   db2020

papaul@asw-b-codfw> show interfaces ge-6/0/5 descriptions    
Interface       Admin Link Description
ge-6/0/5        up    up   db2021

papaul@asw-b-codfw> show interfaces ge-6/0/6 descriptions    
Interface       Admin Link Description
ge-6/0/6        up    up   db2022

papaul@asw-b-codfw> show interfaces ge-6/0/14 descriptions   
Interface       Admin Link Description
ge-6/0/14       up     up db2031
papaul@asw-a-codfw# run show interfaces ge-6/0/17 descriptions 
Interface       Admin Link Description
ge-6/0/17       down  down DISABLED

papaul@asw-b-codfw# run show interfaces ge-6/0/4 descriptions    
Interface       Admin Link Description
ge-6/0/4        down  down DISABLED

papaul@asw-b-codfw# run show interfaces descriptions | match "ge-6/0/[5-6]"      
ge-6/0/5        down  down DISABLED
ge-6/0/6        down  down DISABLED

papaul@asw-b-codfw# run show interfaces ge-6/0/14 descriptions   
Interface       Admin Link Description
ge-6/0/14       down  down DISABLED

Change 507525 had a related patch set uploaded (by Papaul; owner: Papaul):
[operations/dns@master] DNS: Remoce mgmt and production DNS for db2014,db2020,db2021,db2022,db2024,db2031

https://gerrit.wikimedia.org/r/507525

Are the "remove all remaining puppet references" and "disable puppet" boxes done?

@RobH please take a look at this task if you have a minute if there are any other puppet or dhcp references before i merge the dns remove code. There are some boxes that are not checked

Change 507710 had a related patch set uploaded (by RobH; owner: RobH):
[operations/puppet@production] removing db2014,db2020, db2021, db2022, db2024, db2031 remaining references

https://gerrit.wikimedia.org/r/507710

Change 507710 merged by RobH:
[operations/puppet@production] removing old db references

https://gerrit.wikimedia.org/r/507710

Change 507525 merged by RobH:
[operations/dns@master] DNS: Remove mgmt and production DNS for db2014,db2020,db2021,db2022,db2024,db2031

https://gerrit.wikimedia.org/r/507525

RobH removed RobH as the assignee of this task.
RobH updated the task description. (Show Details)
RobH removed a project: Patch-For-Review.