Decommission kafka1018
Closed, ResolvedPublic

Description

In T181518 we swapped kafka1018 with kafka1023 due to an unrecoverable hw failure.

  • - all system services confirmed offline from production use
  • - set all icinga checks to maint mode/disabled while reclaim/decommmission takes place.
  • - remove system from all lvs/pybal active configuration
  • - any service group puppet/hiera/dsh config removed
  • - add role::spare in site.pp

START NON-INTERRUPPTABLE STEPS

  • - disable puppet on host - CANNOT, HOST WONT BOOT
  • - remove all remaining puppet references (include role::spare)
  • - power down host - CANNOT, HOST WONT BOOT
  • - disable switch port
  • - switch port assignment noted on this task (for later removal)
  • - remove production dns entries
  • - puppet node clean, puppet node deactivate

END NON-INTERRUPPTABLE STEPS

  • - system disks wiped (by onsite)
  • - IF DECOM: system unracked and decommissioned (by onsite), update racktables with result
  • - IF DECOM: switch port configration removed from switch once system is unracked.
  • - IF DECOM: mgmt dns entries removed.

Related Objects

elukey created this task.Dec 15 2017, 8:24 AM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptDec 15 2017, 8:24 AM
Dzahn added a subscriber: Dzahn.Dec 15 2017, 9:52 AM
fdans moved this task from Incoming to Radar on the Analytics board.Dec 18 2017, 4:22 PM
Cmjohnson moved this task from Backlog to Decommission on the ops-eqiad board.Jan 2 2018, 4:09 PM
Ottomata triaged this task as Normal priority.Jan 16 2018, 7:32 PM
RobH added a subscriber: RobH.

All decommissioning should be tagged with #hw-requests.

RobH claimed this task.Feb 7 2018, 8:39 PM
RobH updated the task description. (Show Details)
RobH added a comment.Feb 7 2018, 8:44 PM

So I cannot see kafka1018 on the switch stack in row D. @Cmjohnson, I cannot actually finish the non-interrupt steps, since the port isn't noted.

The host is currently powered off, due to its mainboard failing. So it shouldn't have the issue of it coming back online, however the port should be traced onsite and disabled.

Change 408870 had a related patch set uploaded (by RobH; owner: RobH):
[operations/puppet@production] kafka1018 decommission

https://gerrit.wikimedia.org/r/408870

Change 408871 had a related patch set uploaded (by RobH; owner: RobH):
[operations/dns@master] kafka1018 decom, production dns

https://gerrit.wikimedia.org/r/408871

Change 408870 merged by RobH:
[operations/puppet@production] kafka1018 decommission

https://gerrit.wikimedia.org/r/408870

Change 408871 merged by RobH:
[operations/dns@master] kafka1018 decom, production dns

https://gerrit.wikimedia.org/r/408871

RobH reassigned this task from RobH to Cmjohnson.Feb 7 2018, 8:54 PM
RobH removed a project: Patch-For-Review.
RobH updated the task description. (Show Details)

Ok, ready for on-site wipe and unracking (plus the tracing and disabling of the switch port)

Dzahn removed a subscriber: Dzahn.Feb 7 2018, 10:28 PM
Cmjohnson moved this task from Decommission to Up next on the ops-eqiad board.Mar 28 2018, 5:50 PM

Change 427412 had a related patch set uploaded (by Cmjohnson; owner: Cmjohnson):
[operations/dns@master] Removing mgmt dns from kafka1018

https://gerrit.wikimedia.org/r/427412

Change 427412 merged by Cmjohnson:
[operations/dns@master] Removing mgmt dns from kafka1018

https://gerrit.wikimedia.org/r/427412

Cmjohnson updated the task description. (Show Details)Apr 18 2018, 3:45 PM
Cmjohnson closed this task as Resolved.

removed from rack and network port updated (ge-8/0/0). updated racktables and tracking sheet