Page MenuHomePhabricator

Decommission kafka1018
Closed, ResolvedPublic

Description

In T181518 we swapped kafka1018 with kafka1023 due to an unrecoverable hw failure.

  • - all system services confirmed offline from production use
  • - set all icinga checks to maint mode/disabled while reclaim/decommmission takes place.
  • - remove system from all lvs/pybal active configuration
  • - any service group puppet/hiera/dsh config removed
  • - add role::spare in site.pp

START NON-INTERRUPPTABLE STEPS

  • - disable puppet on host - CANNOT, HOST WONT BOOT
  • - remove all remaining puppet references (include role::spare)
  • - power down host - CANNOT, HOST WONT BOOT
  • - disable switch port
  • - switch port assignment noted on this task (for later removal)
  • - remove production dns entries
  • - puppet node clean, puppet node deactivate

END NON-INTERRUPPTABLE STEPS

  • - system disks wiped (by onsite)
  • - IF DECOM: system unracked and decommissioned (by onsite), update racktables with result
  • - IF DECOM: switch port configration removed from switch once system is unracked.
  • - IF DECOM: mgmt dns entries removed.

Related Objects

Event Timeline

Ottomata triaged this task as Medium priority.Jan 16 2018, 7:32 PM
RobH subscribed.

All decommissioning should be tagged with #hw-requests.

RobH updated the task description. (Show Details)

So I cannot see kafka1018 on the switch stack in row D. @Cmjohnson, I cannot actually finish the non-interrupt steps, since the port isn't noted.

The host is currently powered off, due to its mainboard failing. So it shouldn't have the issue of it coming back online, however the port should be traced onsite and disabled.

Change 408870 had a related patch set uploaded (by RobH; owner: RobH):
[operations/puppet@production] kafka1018 decommission

https://gerrit.wikimedia.org/r/408870

Change 408871 had a related patch set uploaded (by RobH; owner: RobH):
[operations/dns@master] kafka1018 decom, production dns

https://gerrit.wikimedia.org/r/408871

Change 408870 merged by RobH:
[operations/puppet@production] kafka1018 decommission

https://gerrit.wikimedia.org/r/408870

Change 408871 merged by RobH:
[operations/dns@master] kafka1018 decom, production dns

https://gerrit.wikimedia.org/r/408871

RobH removed a project: Patch-For-Review.
RobH updated the task description. (Show Details)

Ok, ready for on-site wipe and unracking (plus the tracing and disabling of the switch port)

Change 427412 had a related patch set uploaded (by Cmjohnson; owner: Cmjohnson):
[operations/dns@master] Removing mgmt dns from kafka1018

https://gerrit.wikimedia.org/r/427412

Change 427412 merged by Cmjohnson:
[operations/dns@master] Removing mgmt dns from kafka1018

https://gerrit.wikimedia.org/r/427412

Cmjohnson updated the task description. (Show Details)

removed from rack and network port updated (ge-8/0/0). updated racktables and tracking sheet