Page MenuHomePhabricator

Decom graphite2001/WMF6160
Closed, ResolvedPublic

Description

This task will track the decommission-hardware of graphite2001/WMF6160

  • - all system services confirmed offline from production use
  • - set all icinga checks to maint mode/disabled while reclaim/decommmission takes place. (handled by wmf-decommission-host)
  • - remove system from all lvs/pybal active configuration
  • - any service group puppet/heira/dsh config removed
  • - remove site.pp (replace with role(spare::system) if system isn't shut down immediately during this process.)

START NON-INTERRUPPTABLE STEPS

  • - disable puppet on host
  • - power down host
  • - update status in netbox (inventory for decom, inventory for spare) - updated to planned since we aren't certain of host's future.
  • - disable switch port
  • - switch port assignment noted on this task (for later removal) - asw-b-codfw:ge-5/0/1 - asset tag set to port description
  • - remove all remaining puppet references (include role::spare)
  • - remove production dns entries & mgmt dns entries for the hostname, leave mgmt dns entries for asset tag in palce
  • - puppet node clean, puppet node deactivate (handled by wmf-decommission-host)
  • - remove dbmonitor entries on neodymium/sarin: sudo curl -X DELETE https://debmonitor.discovery.wmnet/hosts/${HOST_FQDN} --cert /etc/debmonitor/ssl/cert.pem --key /etc/debmonitor/ssl/server.key (handled by wmf-decommission-host)

END NON-INTERRUPPTABLE STEPS

  • - system disks wiped (by onsite)
  • - IF DECOM: system unracked and decommissioned (by onsite), update netbox status to offline and remove from rack in netbox.
  • - IF DECOM: switch port configration removed from switch once system is unracked.
  • - IF DECOM: add system to decommission tracking google sheet
  • - IF DECOM: asset tag mgmt dns entries removed.
  • - IF RECLAIM: remove the hostname label and system is already back in 'spares pool' by having the asset tag name in netbox and state of 'inventory'

Event Timeline

RobH triaged this task as Medium priority.Dec 12 2018, 4:35 PM
RobH updated the task description. (Show Details)

I'm taking graphite2001 now to do some tests for prometheus v2 upgrade in T187987: 100% of Prometheus traffic served by Prometheus v2

I'm taking graphite2001 now to do some tests for prometheus v2 upgrade in T187987: 100% of Prometheus traffic served by Prometheus v2

@fgiunchedi Can you confirm this is no longer needed for that?

I'm taking graphite2001 now to do some tests for prometheus v2 upgrade in T187987: 100% of Prometheus traffic served by Prometheus v2

@fgiunchedi Can you confirm this is no longer needed for that?

Yes confirmed, the host is good to be decom'd

I wanted to check with @wiki_willy if we need to reclaim this to spares, or if we can decommission and pull it out of the rack.

The host was purchased on 2015-01-09, and will be 5 years old (and automatically decommissioned) on 2020-01-09. Since that is less than 5 months away, and spare pool systems tend to sit for a few months before reallocation, it may be easiest/best to just decommission and unrack.

cookbooks.sre.hosts.decommission executed by robh@cumin1001 for hosts: graphite2001.codfw.wmnet

  • graphite2001.codfw.wmnet
    • Removed from Puppet master and PuppetDB
    • Downtimed host on Icinga
    • Downtimed management interface on Icinga
    • Removed from DebMonitor

Change 531965 had a related patch set uploaded (by RobH; owner: RobH):
[operations/dns@master] decom graphite2001 dns entries

https://gerrit.wikimedia.org/r/531965

RobH renamed this task from Decom graphite2001 to Decom graphite2001/WMF6160.Aug 23 2019, 7:19 PM
RobH updated the task description. (Show Details)

@RobH - I'll leave it up to @Papaul, since he has a better idea on the chances of reusing the parts on this system. Thanks, Willy

Ok, I synced with @wiki_willy about this and the comment above.

We're going to decommission this host, since it has 5 months of life left before it is 5+ years old. Willy was stating that this can go on the decom pile, and be used for spare parts or just stay on the decom pile as needed/determined by @Papaul as the codfw on-site.

Thanks!

https://gerrit.wikimedia.org/r/c/operations/dns/+/531965

statsd.codfw.wmnet points to graphite2001.codfw.wmnet, so I'm not sure what to point this at.

I assume it is fine to just remove it entirely, since it was pointing at a server in the spare role.

I've assigned the patchset to @fgiunchedi for his approval/review/comment.

Change 531968 had a related patch set uploaded (by RobH; owner: RobH):
[operations/puppet@production] graphite2001 decom

https://gerrit.wikimedia.org/r/531968

RobH updated the task description. (Show Details)
RobH subscribed.
papaul@asw-b-codfw# run show interfaces ge-5/0/1 descriptions       
Interface       Admin Link Description
ge-5/0/1        down  down DISABLED

Change 534500 had a related patch set uploaded (by Papaul; owner: Papaul):
[operations/dns@master] DNS: Remove mgmt and production DNS for graphite2001

https://gerrit.wikimedia.org/r/534500

Change 534500 merged by Papaul:
[operations/dns@master] DNS: Remove mgmt and production DNS for graphite2001

https://gerrit.wikimedia.org/r/534500

Papaul renamed this task from Decom graphite2001/WMF6160 to Decom graphite2001/WMF6160 replaced with WMF6403.Sep 5 2019, 5:14 PM
Papaul renamed this task from Decom graphite2001/WMF6160 replaced with WMF6403 to Decom graphite2001/WMF6160 .

Change 531968 abandoned by RobH:
[operations/puppet@production] graphite2001 decom

Reason:

https://gerrit.wikimedia.org/r/531968

Change 531965 abandoned by RobH:
[operations/dns@master] decom graphite2001 dns entries

Reason:
old neglected patchset, no longer needed.

https://gerrit.wikimedia.org/r/531965