Page MenuHomePhabricator

Decom graphite2001/WMF6160
Closed, ResolvedPublic

Description

This task will track the decommission of graphite2001/WMF6160

  • - all system services confirmed offline from production use
  • - set all icinga checks to maint mode/disabled while reclaim/decommmission takes place. (handled by wmf-decommission-host)
  • - remove system from all lvs/pybal active configuration
  • - any service group puppet/heira/dsh config removed
  • - remove site.pp (replace with role(spare::system) if system isn't shut down immediately during this process.)

START NON-INTERRUPPTABLE STEPS

  • - disable puppet on host
  • - power down host
  • - update status in netbox (inventory for decom, inventory for spare) - updated to planned since we aren't certain of host's future.
  • - disable switch port
  • - switch port assignment noted on this task (for later removal) - asw-b-codfw:ge-5/0/1 - asset tag set to port description
  • - remove all remaining puppet references (include role::spare)
  • - remove production dns entries & mgmt dns entries for the hostname, leave mgmt dns entries for asset tag in palce
  • - puppet node clean, puppet node deactivate (handled by wmf-decommission-host)
  • - remove dbmonitor entries on neodymium/sarin: sudo curl -X DELETE https://debmonitor.discovery.wmnet/hosts/${HOST_FQDN} --cert /etc/debmonitor/ssl/cert.pem --key /etc/debmonitor/ssl/server.key (handled by wmf-decommission-host)

END NON-INTERRUPPTABLE STEPS

  • - system disks wiped (by onsite)
  • - IF DECOM: system unracked and decommissioned (by onsite), update netbox status to offline and remove from rack in netbox.
  • - IF DECOM: switch port configration removed from switch once system is unracked.
  • - IF DECOM: add system to decommission tracking google sheet
  • - IF DECOM: asset tag mgmt dns entries removed.
  • - IF RECLAIM: remove the hostname label and system is already back in 'spares pool' by having the asset tag name in netbox and state of 'inventory'

Details

Related Gerrit Patches:
operations/dns : masterDNS: Remove mgmt and production DNS for graphite2001
operations/dns : masterdecom graphite2001 dns entries
operations/puppet : productiongraphite2001 decom

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJul 23 2018, 3:42 PM
fgiunchedi moved this task from Backlog to Up next on the observability board.Oct 15 2018, 2:38 PM
RobH triaged this task as Medium priority.Dec 12 2018, 4:35 PM
RobH updated the task description. (Show Details)

I'm taking graphite2001 now to do some tests for prometheus v2 upgrade in T187987: 100% of Prometheus traffic served by Prometheus v2

Dzahn moved this task from Backlog to Decommission on the ops-codfw board.Apr 12 2019, 12:10 AM

I'm taking graphite2001 now to do some tests for prometheus v2 upgrade in T187987: 100% of Prometheus traffic served by Prometheus v2

@fgiunchedi Can you confirm this is no longer needed for that?

I'm taking graphite2001 now to do some tests for prometheus v2 upgrade in T187987: 100% of Prometheus traffic served by Prometheus v2

@fgiunchedi Can you confirm this is no longer needed for that?

Yes confirmed, the host is good to be decom'd

MoritzMuehlenhoff updated the task description. (Show Details)

I wanted to check with @wiki_willy if we need to reclaim this to spares, or if we can decommission and pull it out of the rack.

The host was purchased on 2015-01-09, and will be 5 years old (and automatically decommissioned) on 2020-01-09. Since that is less than 5 months away, and spare pool systems tend to sit for a few months before reallocation, it may be easiest/best to just decommission and unrack.

RobH updated the task description. (Show Details)Aug 23 2019, 7:06 PM
RobH updated the task description. (Show Details)Aug 23 2019, 7:09 PM

cookbooks.sre.hosts.decommission executed by robh@cumin1001 for hosts: graphite2001.codfw.wmnet

  • graphite2001.codfw.wmnet
    • Removed from Puppet master and PuppetDB
    • Downtimed host on Icinga
    • Downtimed management interface on Icinga
    • Removed from DebMonitor
RobH updated the task description. (Show Details)Aug 23 2019, 7:13 PM

Change 531965 had a related patch set uploaded (by RobH; owner: RobH):
[operations/dns@master] decom graphite2001 dns entries

https://gerrit.wikimedia.org/r/531965

RobH renamed this task from Decom graphite2001 to Decom graphite2001/WMF6160.Aug 23 2019, 7:19 PM
RobH updated the task description. (Show Details)

@RobH - I'll leave it up to @Papaul, since he has a better idea on the chances of reusing the parts on this system. Thanks, Willy

RobH added a comment.Aug 23 2019, 7:34 PM

Ok, I synced with @wiki_willy about this and the comment above.

We're going to decommission this host, since it has 5 months of life left before it is 5+ years old. Willy was stating that this can go on the decom pile, and be used for spare parts or just stay on the decom pile as needed/determined by @Papaul as the codfw on-site.

Thanks!

RobH added a comment.Aug 23 2019, 7:35 PM

https://gerrit.wikimedia.org/r/c/operations/dns/+/531965

statsd.codfw.wmnet points to graphite2001.codfw.wmnet, so I'm not sure what to point this at.

I assume it is fine to just remove it entirely, since it was pointing at a server in the spare role.

I've assigned the patchset to @fgiunchedi for his approval/review/comment.

Change 531968 had a related patch set uploaded (by RobH; owner: RobH):
[operations/puppet@production] graphite2001 decom

https://gerrit.wikimedia.org/r/531968

RobH reassigned this task from RobH to Papaul.Aug 23 2019, 7:40 PM
RobH updated the task description. (Show Details)
RobH added a subscriber: RobH.
Papaul updated the task description. (Show Details)Sep 4 2019, 3:56 PM
Papaul updated the task description. (Show Details)Sep 4 2019, 6:02 PM
papaul@asw-b-codfw# run show interfaces ge-5/0/1 descriptions       
Interface       Admin Link Description
ge-5/0/1        down  down DISABLED

Change 534500 had a related patch set uploaded (by Papaul; owner: Papaul):
[operations/dns@master] DNS: Remove mgmt and production DNS for graphite2001

https://gerrit.wikimedia.org/r/534500

Change 534500 merged by Papaul:
[operations/dns@master] DNS: Remove mgmt and production DNS for graphite2001

https://gerrit.wikimedia.org/r/534500

Papaul updated the task description. (Show Details)Sep 4 2019, 6:09 PM
Papaul renamed this task from Decom graphite2001/WMF6160 to Decom graphite2001/WMF6160 replaced with WMF6403.Sep 5 2019, 5:14 PM
Papaul renamed this task from Decom graphite2001/WMF6160 replaced with WMF6403 to Decom graphite2001/WMF6160 .
Papaul closed this task as Resolved.Sep 9 2019, 3:00 PM

complete