Page MenuHomePhabricator

Decom mw2213
Closed, ResolvedPublic

Description

The server had memory, mainboard and power supply issues (T194172) and it out of warranty since January, decommission it.

This checklist is able to be copied and pasted into phabricator hardware request tasks for reclaiming systems to spare or decom.

  • - all system services confirmed offline from production use
  • - set all icinga checks to maint mode/disabled while reclaim/decommmission takes place.
  • - remove system from all lvs/pybal active configuration
  • - any service group puppet/heira/dsh config removed
  • - replace with role(spare::system) in site.pp

START NON-INTERRUPPTABLE STEPS

  • - disable puppet on host
  • - power down host
  • - update status in netbox (inventory for decom, planned for spare)
  • - disable switch port
  • - switch port assignment noted on this task (for later removal) aw-c-codfw:ge-4/0/37
  • - remove all remaining puppet references (include role::spare)
  • - remove production dns entries
  • - puppet node clean, puppet node deactivate (handled by wmf-decommission-host)
  • - remove dbmonitor entries on neodymium/sarin: sudo curl -X DELETE https://debmonitor.discovery.wmnet/hosts/${HOST_FQDN} --cert /etc/debmonitor/ssl/cert.pem --key /etc/debmonitor/ssl/server.key (handled by wmf-decommission-host)

END NON-INTERRUPPTABLE STEPS

  • - system disks wiped (by onsite)
  • - IF DECOM: system unracked and decommissioned (by onsite), update netbox with resulting removal from rack and change to 'offline' status
  • - IF DECOM: switch port configration removed from switch once system is unracked.
  • - IF DECOM: add system to decommission tracking google sheet
  • - IF DECOM: mgmt dns entries removed.

Event Timeline

Change 458139 had a related patch set uploaded (by Muehlenhoff; owner: Muehlenhoff):
[operations/puppet@production] Decommission mw2213

https://gerrit.wikimedia.org/r/458139

Change 458139 merged by Muehlenhoff:
[operations/puppet@production] Decommission mw2213

https://gerrit.wikimedia.org/r/458139

Mentioned in SAL (#wikimedia-operations) [2018-09-06T07:10:30Z] <moritzm> run decomission_appserver on mw2213 (T203434)

MoritzMuehlenhoff triaged this task as Medium priority.
MoritzMuehlenhoff updated the task description. (Show Details)

wmf-decommission-host was executed by robh for mw2213.codfw.wmnet and performed the following actions:

  • Revoked Puppet certificate
  • Removed from PuppetDB
  • Downtimed host on Icinga
  • Downtimed mgmt interface on Icinga
  • Removed from DebMonitor
RobH updated the task description. (Show Details)
RobH removed a project: Patch-For-Review.
RobH updated the task description. (Show Details)
RobH removed subscribers: ops-monitoring-bot, Stashbot, gerritbot.

Change 486545 had a related patch set uploaded (by RobH; owner: RobH):
[operations/dns@master] mw2213 decom production dns entries

https://gerrit.wikimedia.org/r/486545

Change 486545 merged by RobH:
[operations/dns@master] mw2213 decom production dns entries

https://gerrit.wikimedia.org/r/486545

RobH updated the task description. (Show Details)

Change 486547 had a related patch set uploaded (by RobH; owner: RobH):
[operations/puppet@production] mw2213 decom

https://gerrit.wikimedia.org/r/486547

Change 486547 merged by RobH:
[operations/puppet@production] mw2213 decom

https://gerrit.wikimedia.org/r/486547

RobH removed a project: Patch-For-Review.
RobH updated the task description. (Show Details)
RobH moved this task from Backlog to Decommission on the ops-codfw board.
RobH subscribed.

Change 489271 had a related patch set uploaded (by Papaul; owner: Papaul):
[operations/dns@master] DNS: Remove mgmt DNS for mw2213

https://gerrit.wikimedia.org/r/489271

Change 489271 merged by Papaul:
[operations/dns@master] DNS: Remove mgmt DNS for mw2213

https://gerrit.wikimedia.org/r/489271

Papaul updated the task description. (Show Details)

complete

jijiki mentioned this in Unknown Object (Task).Dec 4 2019, 4:11 PM