Page MenuHomePhabricator

Decommission old eqiad caches
Open, NormalPublic

Description

The following old eqiad caches have been reimaged to role(spare::system) and need to be decommissioned:

cp104[6789], cp1050, cp105[2345], cp1059, cp1060, cp106[2345678]

cp1045, cp1051, cp1058, cp1061 (ex cache-misc)

cp1045:

  • - all system services confirmed offline from production use
  • - set all icinga checks to maint mode/disabled while reclaim/decommmission takes place.
  • - remove system from all lvs/pybal active configuration
  • - any service group puppet/hiera/dsh config removed
  • - remove site.pp, replace with role(spare::system)
  • - unassign service owner from this task, check off completed steps, and assign to @RobH for followup on below steps.

Steps for DC-Ops:
The following steps cannot be interrupted, as it will leave the system in an unfinished state.
Start non-interrupt steps:

  • - disable puppet on host
  • - power down host
  • - update netbox status to Inventory (if decom) or Planned (if spare)
  • - disable switch port
  • - switch port assignment noted on this task (for later removal)
  • - remove all remaining puppet references (include role::spare)
  • - remove production dns entries
  • - puppet node clean, puppet node deactivate (handled by wmf-decommission-host)
  • - remove dbmonitor entries on neodymium/sarin: sudo curl -X DELETE https://debmonitor.discovery.wmnet/hosts/${HOST_FQDN} --cert /etc/debmonitor/ssl/cert.pem --key /etc/debmonitor/ssl/server.key (handled by wmf-decommission-host)

End non-interrupt steps.

  • - system disks wiped (by onsite)
  • - system unracked and decommissioned (by onsite), update racktables with result
  • - switch port configration removed from switch once system is unracked.
  • - add system to decommission tracking google sheet
  • - mgmt dns entries removed.

cp1046:

  • - all system services confirmed offline from production use
  • - set all icinga checks to maint mode/disabled while reclaim/decommmission takes place.
  • - remove system from all lvs/pybal active configuration
  • - any service group puppet/hiera/dsh config removed
  • - remove site.pp, replace with role(spare::system)
  • - unassign service owner from this task, check off completed steps, and assign to @RobH for followup on below steps.

Steps for DC-Ops:
The following steps cannot be interrupted, as it will leave the system in an unfinished state.
Start non-interrupt steps:

  • - disable puppet on host
  • - power down host
  • - update netbox status to Inventory (if decom) or Planned (if spare)
  • - disable switch port
  • - switch port assignment noted on this task (for later removal)
  • - remove all remaining puppet references (include role::spare)
  • - remove production dns entries
  • - puppet node clean, puppet node deactivate (handled by wmf-decommission-host)
  • - remove dbmonitor entries on neodymium/sarin: sudo curl -X DELETE https://debmonitor.discovery.wmnet/hosts/${HOST_FQDN} --cert /etc/debmonitor/ssl/cert.pem --key /etc/debmonitor/ssl/server.key (handled by wmf-decommission-host)

End non-interrupt steps.

  • - system disks wiped (by onsite)
  • - system unracked and decommissioned (by onsite), update racktables with result
  • - switch port configration removed from switch once system is unracked.
  • - add system to decommission tracking google sheet
  • - mgmt dns entries removed.

cp1047:

  • - all system services confirmed offline from production use
  • - set all icinga checks to maint mode/disabled while reclaim/decommmission takes place.
  • - remove system from all lvs/pybal active configuration
  • - any service group puppet/hiera/dsh config removed
  • - remove site.pp, replace with role(spare::system)
  • - unassign service owner from this task, check off completed steps, and assign to @RobH for followup on below steps.

Steps for DC-Ops:
The following steps cannot be interrupted, as it will leave the system in an unfinished state.
Start non-interrupt steps:

  • - disable puppet on host
  • - power down host
  • - update netbox status to Inventory (if decom) or Planned (if spare)
  • - disable switch port
  • - switch port assignment noted on this task (for later removal)
  • - remove all remaining puppet references (include role::spare)
  • - remove production dns entries
  • - puppet node clean, puppet node deactivate (handled by wmf-decommission-host)
  • - remove dbmonitor entries on neodymium/sarin: sudo curl -X DELETE https://debmonitor.discovery.wmnet/hosts/${HOST_FQDN} --cert /etc/debmonitor/ssl/cert.pem --key /etc/debmonitor/ssl/server.key (handled by wmf-decommission-host)

End non-interrupt steps.

  • - system disks wiped (by onsite)
  • - system unracked and decommissioned (by onsite), update racktables with result
  • - switch port configration removed from switch once system is unracked.
  • - add system to decommission tracking google sheet
  • - mgmt dns entries removed.

cp1048:

  • - all system services confirmed offline from production use
  • - set all icinga checks to maint mode/disabled while reclaim/decommmission takes place.
  • - remove system from all lvs/pybal active configuration
  • - any service group puppet/hiera/dsh config removed
  • - remove site.pp, replace with role(spare::system)
  • - unassign service owner from this task, check off completed steps, and assign to @RobH for followup on below steps.

Steps for DC-Ops:
The following steps cannot be interrupted, as it will leave the system in an unfinished state.
Start non-interrupt steps:

  • - disable puppet on host
  • - power down host
  • - update netbox status to Inventory (if decom) or Planned (if spare)
  • - disable switch port
  • - switch port assignment noted on this task (for later removal)
  • - remove all remaining puppet references (include role::spare)
  • - remove production dns entries
  • - puppet node clean, puppet node deactivate (handled by wmf-decommission-host)
  • - remove dbmonitor entries on neodymium/sarin: sudo curl -X DELETE https://debmonitor.discovery.wmnet/hosts/${HOST_FQDN} --cert /etc/debmonitor/ssl/cert.pem --key /etc/debmonitor/ssl/server.key (handled by wmf-decommission-host)

End non-interrupt steps.

  • - system disks wiped (by onsite)
  • - system unracked and decommissioned (by onsite), update racktables with result
  • - switch port configration removed from switch once system is unracked.
  • - add system to decommission tracking google sheet
  • - mgmt dns entries removed.

cp1049:

  • - all system services confirmed offline from production use
  • - set all icinga checks to maint mode/disabled while reclaim/decommmission takes place.
  • - remove system from all lvs/pybal active configuration
  • - any service group puppet/hiera/dsh config removed
  • - remove site.pp, replace with role(spare::system)
  • - unassign service owner from this task, check off completed steps, and assign to @RobH for followup on below steps.

Steps for DC-Ops:
The following steps cannot be interrupted, as it will leave the system in an unfinished state.
Start non-interrupt steps:

  • - disable puppet on host
  • - power down host
  • - update netbox status to Inventory (if decom) or Planned (if spare)
  • - disable switch port
  • - switch port assignment noted on this task (for later removal)
  • - remove all remaining puppet references (include role::spare)
  • - remove production dns entries
  • - puppet node clean, puppet node deactivate (handled by wmf-decommission-host)
  • - remove dbmonitor entries on neodymium/sarin: sudo curl -X DELETE https://debmonitor.discovery.wmnet/hosts/${HOST_FQDN} --cert /etc/debmonitor/ssl/cert.pem --key /etc/debmonitor/ssl/server.key (handled by wmf-decommission-host)

End non-interrupt steps.

  • - system disks wiped (by onsite)
  • - system unracked and decommissioned (by onsite), update racktables with result
  • - switch port configration removed from switch once system is unracked.
  • - add system to decommission tracking google sheet
  • - mgmt dns entries removed.

cp1050:

  • - all system services confirmed offline from production use
  • - set all icinga checks to maint mode/disabled while reclaim/decommmission takes place.
  • - remove system from all lvs/pybal active configuration
  • - any service group puppet/hiera/dsh config removed
  • - remove site.pp, replace with role(spare::system)
  • - unassign service owner from this task, check off completed steps, and assign to @RobH for followup on below steps.

Steps for DC-Ops:
The following steps cannot be interrupted, as it will leave the system in an unfinished state.
Start non-interrupt steps:

  • - disable puppet on host
  • - power down host
  • - update netbox status to Inventory (if decom) or Planned (if spare)
  • - disable switch port
  • - switch port assignment noted on this task (for later removal)
  • - remove all remaining puppet references (include role::spare)
  • - remove production dns entries
  • - puppet node clean, puppet node deactivate (handled by wmf-decommission-host)
  • - remove dbmonitor entries on neodymium/sarin: sudo curl -X DELETE https://debmonitor.discovery.wmnet/hosts/${HOST_FQDN} --cert /etc/debmonitor/ssl/cert.pem --key /etc/debmonitor/ssl/server.key (handled by wmf-decommission-host)

End non-interrupt steps.

  • - system disks wiped (by onsite)
  • - system unracked and decommissioned (by onsite), update racktables with result
  • - switch port configration removed from switch once system is unracked.
  • - add system to decommission tracking google sheet
  • - mgmt dns entries removed.

cp1051:

  • - all system services confirmed offline from production use
  • - set all icinga checks to maint mode/disabled while reclaim/decommmission takes place.
  • - remove system from all lvs/pybal active configuration
  • - any service group puppet/hiera/dsh config removed
  • - remove site.pp, replace with role(spare::system)
  • - unassign service owner from this task, check off completed steps, and assign to @RobH for followup on below steps.

Steps for DC-Ops:
The following steps cannot be interrupted, as it will leave the system in an unfinished state.
Start non-interrupt steps:

  • - disable puppet on host
  • - power down host
  • - update netbox status to Inventory (if decom) or Planned (if spare)
  • - disable switch port
  • - switch port assignment noted on this task (for later removal)
  • - remove all remaining puppet references (include role::spare)
  • - remove production dns entries
  • - puppet node clean, puppet node deactivate (handled by wmf-decommission-host)
  • - remove dbmonitor entries on neodymium/sarin: sudo curl -X DELETE https://debmonitor.discovery.wmnet/hosts/${HOST_FQDN} --cert /etc/debmonitor/ssl/cert.pem --key /etc/debmonitor/ssl/server.key (handled by wmf-decommission-host)

End non-interrupt steps.

  • - system disks wiped (by onsite)
  • - system unracked and decommissioned (by onsite), update racktables with result
  • - switch port configration removed from switch once system is unracked.
  • - add system to decommission tracking google sheet
  • - mgmt dns entries removed.

cp1052:

  • - all system services confirmed offline from production use
  • - set all icinga checks to maint mode/disabled while reclaim/decommmission takes place.
  • - remove system from all lvs/pybal active configuration
  • - any service group puppet/hiera/dsh config removed
  • - remove site.pp, replace with role(spare::system)
  • - unassign service owner from this task, check off completed steps, and assign to @RobH for followup on below steps.

Steps for DC-Ops:
The following steps cannot be interrupted, as it will leave the system in an unfinished state.
Start non-interrupt steps:

  • - disable puppet on host
  • - power down host
  • - update netbox status to Inventory (if decom) or Planned (if spare)
  • - disable switch port
  • - switch port assignment noted on this task (for later removal)
  • - remove all remaining puppet references (include role::spare)
  • - remove production dns entries
  • - puppet node clean, puppet node deactivate (handled by wmf-decommission-host)
  • - remove dbmonitor entries on neodymium/sarin: sudo curl -X DELETE https://debmonitor.discovery.wmnet/hosts/${HOST_FQDN} --cert /etc/debmonitor/ssl/cert.pem --key /etc/debmonitor/ssl/server.key (handled by wmf-decommission-host)

End non-interrupt steps.

  • - system disks wiped (by onsite)
  • - system unracked and decommissioned (by onsite), update racktables with result
  • - switch port configration removed from switch once system is unracked.
  • - add system to decommission tracking google sheet
  • - mgmt dns entries removed.

cp1053:

  • - all system services confirmed offline from production use
  • - set all icinga checks to maint mode/disabled while reclaim/decommmission takes place.
  • - remove system from all lvs/pybal active configuration
  • - any service group puppet/hiera/dsh config removed
  • - remove site.pp, replace with role(spare::system)
  • - unassign service owner from this task, check off completed steps, and assign to @RobH for followup on below steps.

Steps for DC-Ops:
The following steps cannot be interrupted, as it will leave the system in an unfinished state.
Start non-interrupt steps:

  • - disable puppet on host
  • - power down host
  • - update netbox status to Inventory (if decom) or Planned (if spare)
  • - disable switch port
  • - switch port assignment noted on this task (for later removal)
  • - remove all remaining puppet references (include role::spare)
  • - remove production dns entries
  • - puppet node clean, puppet node deactivate (handled by wmf-decommission-host)
  • - remove dbmonitor entries on neodymium/sarin: sudo curl -X DELETE https://debmonitor.discovery.wmnet/hosts/${HOST_FQDN} --cert /etc/debmonitor/ssl/cert.pem --key /etc/debmonitor/ssl/server.key (handled by wmf-decommission-host)

End non-interrupt steps.

  • - system disks wiped (by onsite)
  • - system unracked and decommissioned (by onsite), update racktables with result
  • - switch port configration removed from switch once system is unracked.
  • - add system to decommission tracking google sheet
  • - mgmt dns entries removed.

cp1054:

  • - all system services confirmed offline from production use
  • - set all icinga checks to maint mode/disabled while reclaim/decommmission takes place.
  • - remove system from all lvs/pybal active configuration
  • - any service group puppet/hiera/dsh config removed
  • - remove site.pp, replace with role(spare::system)
  • - unassign service owner from this task, check off completed steps, and assign to @RobH for followup on below steps.

Steps for DC-Ops:
The following steps cannot be interrupted, as it will leave the system in an unfinished state.
Start non-interrupt steps:

  • - disable puppet on host
  • - power down host
  • - update netbox status to Inventory (if decom) or Planned (if spare)
  • - disable switch port
  • - switch port assignment noted on this task (for later removal)
  • - remove all remaining puppet references (include role::spare)
  • - remove production dns entries
  • - puppet node clean, puppet node deactivate (handled by wmf-decommission-host)
  • - remove dbmonitor entries on neodymium/sarin: sudo curl -X DELETE https://debmonitor.discovery.wmnet/hosts/${HOST_FQDN} --cert /etc/debmonitor/ssl/cert.pem --key /etc/debmonitor/ssl/server.key (handled by wmf-decommission-host)

End non-interrupt steps.

  • - system disks wiped (by onsite)
  • - system unracked and decommissioned (by onsite), update racktables with result
  • - switch port configration removed from switch once system is unracked.
  • - add system to decommission tracking google sheet
  • - mgmt dns entries removed.

cp1055:

  • - all system services confirmed offline from production use
  • - set all icinga checks to maint mode/disabled while reclaim/decommmission takes place.
  • - remove system from all lvs/pybal active configuration
  • - any service group puppet/hiera/dsh config removed
  • - remove site.pp, replace with role(spare::system)
  • - unassign service owner from this task, check off completed steps, and assign to @RobH for followup on below steps.

Steps for DC-Ops:
The following steps cannot be interrupted, as it will leave the system in an unfinished state.
Start non-interrupt steps:

  • - disable puppet on host
  • - power down host
  • - update netbox status to Inventory (if decom) or Planned (if spare)
  • - disable switch port
  • - switch port assignment noted on this task (for later removal)
  • - remove all remaining puppet references (include role::spare)
  • - remove production dns entries
  • - puppet node clean, puppet node deactivate (handled by wmf-decommission-host)
  • - remove dbmonitor entries on neodymium/sarin: sudo curl -X DELETE https://debmonitor.discovery.wmnet/hosts/${HOST_FQDN} --cert /etc/debmonitor/ssl/cert.pem --key /etc/debmonitor/ssl/server.key (handled by wmf-decommission-host)

End non-interrupt steps.

  • - system disks wiped (by onsite)
  • - system unracked and decommissioned (by onsite), update racktables with result
  • - switch port configration removed from switch once system is unracked.
  • - add system to decommission tracking google sheet
  • - mgmt dns entries removed.

cp1058:

  • - remove site.pp, replace with role(spare::system)
  • - disable puppet on host - host has no network conneciton, not needed
  • - power down host
  • - update netbox status to Inventory (if decom) or Planned (if spare)
  • - nothing to do on switch, this was on asw2-a5-eqiad.
  • - remove all remaining puppet references (include role::spare)
  • - remove production dns entries
  • - puppet node clean, puppet node deactivate (handled by wmf-decommission-host)
  • - remove dbmonitor entries on neodymium/sarin: sudo curl -X DELETE https://debmonitor.discovery.wmnet/hosts/${HOST_FQDN} --cert /etc/debmonitor/ssl/cert.pem --key /etc/debmonitor/ssl/server.key (handled by wmf-decommission-host)
  • - system disks wiped (by onsite)
  • - system unracked and decommissioned (by onsite), update racktables with result
  • - add system to decommission tracking google sheet
  • - mgmt dns entries removed.

cp1059:

  • - remove site.pp, replace with role(spare::system)
  • - disable puppet on host - host has no network conneciton, not needed
  • - power down host
  • - update netbox status to Inventory (if decom) or Planned (if spare)
  • - nothing to do on switch, this was on asw2-a5-eqiad.
  • - remove all remaining puppet references (include role::spare)
  • - remove production dns entries
  • - puppet node clean, puppet node deactivate (handled by wmf-decommission-host)
  • - remove dbmonitor entries on neodymium/sarin: sudo curl -X DELETE https://debmonitor.discovery.wmnet/hosts/${HOST_FQDN} --cert /etc/debmonitor/ssl/cert.pem --key /etc/debmonitor/ssl/server.key (handled by wmf-decommission-host)
  • - system disks wiped (by onsite)
  • - system unracked and decommissioned (by onsite), update racktables with result
  • - add system to decommission tracking google sheet
  • - mgmt dns entries removed.

cp1060:

  • - remove site.pp, replace with role(spare::system)
  • - disable puppet on host - host has no network conneciton, not needed
  • - power down host
  • - update netbox status to Inventory (if decom) or Planned (if spare)
  • - nothing to do on switch, this was on asw2-a5-eqiad.
  • - remove all remaining puppet references (include role::spare)
  • - remove production dns entries
  • - puppet node clean, puppet node deactivate (handled by wmf-decommission-host)
  • - remove dbmonitor entries on neodymium/sarin: sudo curl -X DELETE https://debmonitor.discovery.wmnet/hosts/${HOST_FQDN} --cert /etc/debmonitor/ssl/cert.pem --key /etc/debmonitor/ssl/server.key (handled by wmf-decommission-host)
  • - system disks wiped (by onsite)
  • - system unracked and decommissioned (by onsite), update racktables with result
  • - add system to decommission tracking google sheet
  • - mgmt dns entries removed.

cp1061:

  • - remove site.pp, replace with role(spare::system)
  • - disable puppet on host - host has no network conneciton, not needed
  • - power down host
  • - update netbox status to Inventory (if decom) or Planned (if spare)
  • - nothing to do on switch, this was on asw2-a5-eqiad.
  • - remove all remaining puppet references (include role::spare)
  • - remove production dns entries
  • - puppet node clean, puppet node deactivate (handled by wmf-decommission-host)
  • - remove dbmonitor entries on neodymium/sarin: sudo curl -X DELETE https://debmonitor.discovery.wmnet/hosts/${HOST_FQDN} --cert /etc/debmonitor/ssl/cert.pem --key /etc/debmonitor/ssl/server.key (handled by wmf-decommission-host)
  • - system disks wiped (by onsite)
  • - system unracked and decommissioned (by onsite), update racktables with result
  • - add system to decommission tracking google sheet
  • - mgmt dns entries removed.

cp1062:

  • - remove site.pp, replace with role(spare::system)
  • - disable puppet on host - host has no network conneciton, not needed
  • - power down host
  • - update netbox status to Inventory (if decom) or Planned (if spare)
  • - nothing to do on switch, this was on asw2-a5-eqiad.
  • - remove all remaining puppet references (include role::spare)
  • - remove production dns entries
  • - puppet node clean, puppet node deactivate (handled by wmf-decommission-host)
  • - remove dbmonitor entries on neodymium/sarin: sudo curl -X DELETE https://debmonitor.discovery.wmnet/hosts/${HOST_FQDN} --cert /etc/debmonitor/ssl/cert.pem --key /etc/debmonitor/ssl/server.key (handled by wmf-decommission-host)
  • - system disks wiped (by onsite)
  • - system unracked and decommissioned (by onsite), update racktables with result
  • - add system to decommission tracking google sheet
  • - mgmt dns entries removed.

cp1063:

  • - remove site.pp, replace with role(spare::system)
  • - disable puppet on host - host has no network conneciton, not needed
  • - power down host
  • - update netbox status to Inventory (if decom) or Planned (if spare)
  • - nothing to do on switch, this was on asw2-a5-eqiad.
  • - remove all remaining puppet references (include role::spare)
  • - remove production dns entries
  • - puppet node clean, puppet node deactivate (handled by wmf-decommission-host)
  • - remove dbmonitor entries on neodymium/sarin: sudo curl -X DELETE https://debmonitor.discovery.wmnet/hosts/${HOST_FQDN} --cert /etc/debmonitor/ssl/cert.pem --key /etc/debmonitor/ssl/server.key (handled by wmf-decommission-host)
  • - system disks wiped (by onsite)
  • - system unracked and decommissioned (by onsite), update racktables with result
  • - add system to decommission tracking google sheet
  • - mgmt dns entries removed.

cp1064:

  • - remove site.pp, replace with role(spare::system)
  • - disable puppet on host - host has no network conneciton, not needed
  • - power down host
  • - update netbox status to Inventory (if decom) or Planned (if spare)
  • - nothing to do on switch, this was on asw2-a5-eqiad.
  • - remove all remaining puppet references (include role::spare)
  • - remove production dns entries
  • - puppet node clean, puppet node deactivate (handled by wmf-decommission-host)
  • - remove dbmonitor entries on neodymium/sarin: sudo curl -X DELETE https://debmonitor.discovery.wmnet/hosts/${HOST_FQDN} --cert /etc/debmonitor/ssl/cert.pem --key /etc/debmonitor/ssl/server.key (handled by wmf-decommission-host)
  • - system disks wiped (by onsite)
  • - system unracked and decommissioned (by onsite), update racktables with result
  • - add system to decommission tracking google sheet
  • - mgmt dns entries removed.

cp1065:

  • - remove site.pp, replace with role(spare::system)
  • - disable puppet on host - host has no network conneciton, not needed
  • - power down host
  • - update netbox status to Inventory (if decom) or Planned (if spare)
  • - nothing to do on switch, this was on asw2-a5-eqiad.
  • - remove all remaining puppet references (include role::spare)
  • - remove production dns entries
  • - puppet node clean, puppet node deactivate (handled by wmf-decommission-host)
  • - remove dbmonitor entries on neodymium/sarin: sudo curl -X DELETE https://debmonitor.discovery.wmnet/hosts/${HOST_FQDN} --cert /etc/debmonitor/ssl/cert.pem --key /etc/debmonitor/ssl/server.key (handled by wmf-decommission-host)
  • - system disks wiped (by onsite)
  • - system unracked and decommissioned (by onsite), update racktables with result
  • - add system to decommission tracking google sheet
  • - mgmt dns entries removed.

cp1066:

  • - remove site.pp, replace with role(spare::system)
  • - disable puppet on host - host has no network conneciton, not needed
  • - power down host
  • - update netbox status to Inventory (if decom) or Planned (if spare)
  • - nothing to do on switch, this was on asw2-a5-eqiad.
  • - remove all remaining puppet references (include role::spare)
  • - remove production dns entries
  • - puppet node clean, puppet node deactivate (handled by wmf-decommission-host)
  • - remove dbmonitor entries on neodymium/sarin: sudo curl -X DELETE https://debmonitor.discovery.wmnet/hosts/${HOST_FQDN} --cert /etc/debmonitor/ssl/cert.pem --key /etc/debmonitor/ssl/server.key (handled by wmf-decommission-host)
  • - system disks wiped (by onsite)
  • - system unracked and decommissioned (by onsite), update racktables with result
  • - add system to decommission tracking google sheet
  • - mgmt dns entries removed.

cp1067:

  • - remove site.pp, replace with role(spare::system)
  • - disable puppet on host - host has no network conneciton, not needed
  • - power down host
  • - update netbox status to Inventory (if decom) or Planned (if spare)
  • - nothing to do on switch, this was on asw2-a5-eqiad.
  • - remove all remaining puppet references (include role::spare)
  • - remove production dns entries
  • - puppet node clean, puppet node deactivate (handled by wmf-decommission-host)
  • - remove dbmonitor entries on neodymium/sarin: sudo curl -X DELETE https://debmonitor.discovery.wmnet/hosts/${HOST_FQDN} --cert /etc/debmonitor/ssl/cert.pem --key /etc/debmonitor/ssl/server.key (handled by wmf-decommission-host)
  • - system disks wiped (by onsite)
  • - system unracked and decommissioned (by onsite), update racktables with result
  • - add system to decommission tracking google sheet
  • - mgmt dns entries removed.

cp1068:

  • - remove site.pp, replace with role(spare::system)
  • - disable puppet on host - host has no network conneciton, not needed
  • - power down host
  • - update netbox status to Inventory (if decom) or Planned (if spare)
  • - nothing to do on switch, this was on asw2-a5-eqiad.
  • - remove all remaining puppet references (include role::spare)
  • - remove production dns entries
  • - puppet node clean, puppet node deactivate (handled by wmf-decommission-host)
  • - remove dbmonitor entries on neodymium/sarin: sudo curl -X DELETE https://debmonitor.discovery.wmnet/hosts/${HOST_FQDN} --cert /etc/debmonitor/ssl/cert.pem --key /etc/debmonitor/ssl/server.key (handled by wmf-decommission-host)
  • - system disks wiped (by onsite)
  • - system unracked and decommissioned (by onsite), update racktables with result
  • - add system to decommission tracking google sheet
  • - mgmt dns entries removed.

Details

Related Gerrit Patches:
operations/puppet : productiondecom cp10[58-68] repo entries
operations/dns : masterdecom cp10[58-68] prod dns
operations/dns : masterdecom cp10[45-55] production dns entries
operations/puppet : productiondecommission old eqiad cache entries

Event Timeline

ema created this task.Nov 2 2018, 1:20 PM
Restricted Application added a project: Operations. · View Herald TranscriptNov 2 2018, 1:20 PM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
ema renamed this task from Decommission 18 old eqiad caches to Decommission old eqiad caches.Nov 2 2018, 1:42 PM
ema triaged this task as Normal priority.
ema updated the task description. (Show Details)
ema added projects: decommission, ops-eqiad.
ema moved this task from Triage to Hardware on the Traffic board.Nov 2 2018, 1:44 PM
BBlack added a comment.Nov 2 2018, 2:52 PM

In case parsing all those regexes gets annoying/confusing:

The set of cp servers that are being decommed from eqiad is everything with numbers in the range cp1045-cp1068 (which sounds like 24 hosts, but it's actually 22 because two numbers in the middle don't exist: cp1056 and cp1057).

The only thing left below that range is cp1008 that we're keeping (for now), and above it: 1069 and 1070 do not currently exist, and everything from 1071 and up is still in use.

ayounsi added a subscriber: ayounsi.Nov 5 2018, 3:36 PM
Cmjohnson moved this task from Backlog to Decommission on the ops-eqiad board.Nov 7 2018, 1:49 PM

Icinga is flagging broken memory on 1053, simply leaving a note here as that host is up for decom anyway.

RobH updated the task description. (Show Details)Feb 28 2019, 10:40 PM
RobH updated the task description. (Show Details)Feb 28 2019, 10:49 PM
RobH added a subscriber: RobH.

wmf-decommission-host was executed by robh for cp1045.eqiad.wmnet and performed the following actions:

  • Revoked Puppet certificate
  • Removed from PuppetDB
  • Downtimed host on Icinga
  • Downtimed mgmt interface on Icinga
  • Removed from DebMonitor

wmf-decommission-host was executed by robh for cp1046.eqiad.wmnet and performed the following actions:

  • Revoked Puppet certificate
  • Removed from PuppetDB
  • Downtimed host on Icinga
  • Downtimed mgmt interface on Icinga
  • Removed from DebMonitor

wmf-decommission-host was executed by robh for cp1047.eqiad.wmnet and performed the following actions:

  • Revoked Puppet certificate
  • Removed from PuppetDB
  • Downtimed host on Icinga
  • Downtimed mgmt interface on Icinga
  • Removed from DebMonitor

wmf-decommission-host was executed by robh for cp1048.eqiad.wmnet and performed the following actions:

  • Revoked Puppet certificate
  • Removed from PuppetDB
  • Downtimed host on Icinga
  • Downtimed mgmt interface on Icinga
  • Removed from DebMonitor

wmf-decommission-host was executed by robh for cp1049.eqiad.wmnet and performed the following actions:

  • Revoked Puppet certificate
  • Removed from PuppetDB
  • Downtimed host on Icinga
  • Downtimed mgmt interface on Icinga
  • Removed from DebMonitor

wmf-decommission-host was executed by robh for cp1050.eqiad.wmnet and performed the following actions:

  • Revoked Puppet certificate
  • Removed from PuppetDB
  • Downtimed host on Icinga
  • Downtimed mgmt interface on Icinga
  • Removed from DebMonitor

wmf-decommission-host was executed by robh for cp1051.eqiad.wmnet and performed the following actions:

  • Revoked Puppet certificate
  • Removed from PuppetDB
  • Downtimed host on Icinga
  • Downtimed mgmt interface on Icinga
  • Removed from DebMonitor

wmf-decommission-host was executed by robh for cp1052.eqiad.wmnet and performed the following actions:

  • Revoked Puppet certificate
  • Removed from PuppetDB
  • Downtimed host on Icinga
  • Downtimed mgmt interface on Icinga
  • Removed from DebMonitor

wmf-decommission-host was executed by robh for cp1053.eqiad.wmnet and performed the following actions:

  • Revoked Puppet certificate
  • Removed from PuppetDB
  • Downtimed host on Icinga
  • Downtimed mgmt interface on Icinga
  • Removed from DebMonitor

wmf-decommission-host was executed by robh for cp1054.eqiad.wmnet and performed the following actions:

  • Revoked Puppet certificate
  • Removed from PuppetDB
  • Downtimed host on Icinga
  • Downtimed mgmt interface on Icinga
  • Removed from DebMonitor

wmf-decommission-host was executed by robh for cp1055.eqiad.wmnet and performed the following actions:

  • Revoked Puppet certificate
  • Removed from PuppetDB
  • Downtimed host on Icinga
  • Downtimed mgmt interface on Icinga
  • Removed from DebMonitor
RobH added a comment.EditedFeb 28 2019, 11:18 PM

Please note cp1045-cp1055 are all on asw-c-eqiad as their active switch, but ports were also reserved on asw2-c-eqiad for migration (if they were not decommissioned before old stack was decommssioned).

output on asw2-c-eqiad:

xe-8/0/0                   cp1045
xe-8/0/1                   cp1046
xe-8/0/2                   cp1047
xe-8/0/3                   cp1048
xe-8/0/4                   cp1049
xe-8/0/5                   cp1050
xe-8/0/6                   cp1051
xe-8/0/7                   cp1052
xe-8/0/8                   cp1053
xe-8/0/9                   cp1054
xe-8/0/10                  cp1055

made changes to asw-c-eqiad:

robh@asw-c-eqiad# show | compare 
[edit interfaces interface-range vlan-private1-c-eqiad]
     member-range ge-7/0/36 { ... }
+    member-range xe-8/0/11 to xe-8/0/22;
-    member-range xe-8/0/0 to xe-8/0/22;
[edit interfaces interface-range disabled]
     member ge-7/0/16 { ... }
+    member xe-8/0/0;
+    member xe-8/0/1;
+    member xe-8/0/2;
+    member xe-8/0/3;
+    member xe-8/0/4;
+    member xe-8/0/5;
+    member xe-8/0/6;
+    member xe-8/0/7;
+    member xe-8/0/8;
+    member xe-8/0/9;
+    member xe-8/0/10;

Change 493626 had a related patch set uploaded (by RobH; owner: RobH):
[operations/puppet@production] decommission old eqiad cache entries

https://gerrit.wikimedia.org/r/493626

Change 493626 merged by RobH:
[operations/puppet@production] decommission old eqiad cache entries

https://gerrit.wikimedia.org/r/493626

Change 493627 had a related patch set uploaded (by RobH; owner: RobH):
[operations/dns@master] decom cp10[45-55] production dns entries

https://gerrit.wikimedia.org/r/493627

Change 493627 merged by RobH:
[operations/dns@master] decom cp10[45-55] production dns entries

https://gerrit.wikimedia.org/r/493627

RobH updated the task description. (Show Details)
RobH moved this task from Backlog to Ready for Decommission on the decommission board.

wmf-decommission-host was executed by robh for cp1058.eqiad.wmnet and performed the following actions:

  • Revoked Puppet certificate
  • Removed from PuppetDB
  • Skipped downtime host on Icinga (likely already removed)
  • Skipped downtime mgmt interface on Icinga (likely already removed)
  • Removed from DebMonitor

wmf-decommission-host was executed by robh for cp1059.eqiad.wmnet and performed the following actions:

  • Revoked Puppet certificate
  • Removed from PuppetDB
  • Skipped downtime host on Icinga (likely already removed)
  • Skipped downtime mgmt interface on Icinga (likely already removed)
  • Removed from DebMonitor

wmf-decommission-host was executed by robh for cp1060.eqiad.wmnet and performed the following actions:

  • Revoked Puppet certificate
  • Removed from PuppetDB
  • Skipped downtime host on Icinga (likely already removed)
  • Skipped downtime mgmt interface on Icinga (likely already removed)
  • Removed from DebMonitor

wmf-decommission-host was executed by robh for cp1061.eqiad.wmnet and performed the following actions:

  • Revoked Puppet certificate
  • Removed from PuppetDB
  • Skipped downtime host on Icinga (likely already removed)
  • Skipped downtime mgmt interface on Icinga (likely already removed)
  • Removed from DebMonitor

wmf-decommission-host was executed by robh for cp1062.eqiad.wmnet and performed the following actions:

  • Revoked Puppet certificate
  • Removed from PuppetDB
  • Skipped downtime host on Icinga (likely already removed)
  • Skipped downtime mgmt interface on Icinga (likely already removed)
  • Removed from DebMonitor

wmf-decommission-host was executed by robh for cp1063.eqiad.wmnet and performed the following actions:

  • Revoked Puppet certificate
  • Removed from PuppetDB
  • Skipped downtime host on Icinga (likely already removed)
  • Skipped downtime mgmt interface on Icinga (likely already removed)
  • Removed from DebMonitor

wmf-decommission-host was executed by robh for cp1064.eqiad.wmnet and performed the following actions:

  • Revoked Puppet certificate
  • Removed from PuppetDB
  • Skipped downtime host on Icinga (likely already removed)
  • Skipped downtime mgmt interface on Icinga (likely already removed)
  • Removed from DebMonitor

wmf-decommission-host was executed by robh for cp1065.eqiad.wmnet and performed the following actions:

  • Revoked Puppet certificate
  • Removed from PuppetDB
  • Skipped downtime host on Icinga (likely already removed)
  • Skipped downtime mgmt interface on Icinga (likely already removed)
  • Removed from DebMonitor

wmf-decommission-host was executed by robh for cp1066.eqiad.wmnet and performed the following actions:

  • Revoked Puppet certificate
  • Removed from PuppetDB
  • Skipped downtime host on Icinga (likely already removed)
  • Skipped downtime mgmt interface on Icinga (likely already removed)
  • Removed from DebMonitor

wmf-decommission-host was executed by robh for cp1067.eqiad.wmnet and performed the following actions:

  • Revoked Puppet certificate
  • Removed from PuppetDB
  • Skipped downtime host on Icinga (likely already removed)
  • Skipped downtime mgmt interface on Icinga (likely already removed)
  • Removed from DebMonitor

wmf-decommission-host was executed by robh for cp1068.eqiad.wmnet and performed the following actions:

  • Revoked Puppet certificate
  • Removed from PuppetDB
  • Skipped downtime host on Icinga (likely already removed)
  • Skipped downtime mgmt interface on Icinga (likely already removed)
  • Removed from DebMonitor

Change 494617 had a related patch set uploaded (by RobH; owner: RobH):
[operations/dns@master] decom cp10[58-68] prod dns

https://gerrit.wikimedia.org/r/494617

Change 494617 merged by RobH:
[operations/dns@master] decom cp10[58-68] prod dns

https://gerrit.wikimedia.org/r/494617

Change 494618 had a related patch set uploaded (by RobH; owner: RobH):
[operations/puppet@production] decom cp10[58-68] repo entries

https://gerrit.wikimedia.org/r/494618

Change 494618 merged by RobH:
[operations/puppet@production] decom cp10[58-68] repo entries

https://gerrit.wikimedia.org/r/494618

RobH assigned this task to Cmjohnson.Mar 5 2019, 11:19 PM
RobH updated the task description. (Show Details)