Page MenuHomePhabricator

Decommission stat1002.eqiad.wmnet
Closed, ResolvedPublic

Description

This checklist is able to be copied and pasted into phabricator hardware request tasks for reclaiming systems to spare or decom.

  • - all system services confirmed offline from production use (T173094)
  • - set all icinga checks to maint mode/disabled while reclaim/decommmission takes place.
  • - remove system from all lvs/pybal active configuration
  • - any service group puppet/hiera/dsh config removed
  • - remove site.pp (replace with role::spare if system isn't shut down immediately during this process.)

START NON-INTERRUPPTABLE STEPS

  • - disable puppet on host
  • - remove all remaining puppet references (include role::spare)
  • - power down host
  • - disable switch port
  • - switch port assignment noted on this task (for later removal) asw-a-eqiad:ge-2/0/11
  • - remove production dns entries
  • - all ops/puppet repo references removed
  • - puppet node clean, puppet node deactivate, salt key removed

END NON-INTERRUPPTABLE STEPS
Please note the system has been unracked as of 2017-08-11 for data recovery, until that completes, remaining steps cannot be completed (T173094).

  • - system disks wiped (by onsite)
  • - system unracked and decommissioned (by onsite), update racktables with result
  • - switch port configration removed from switch once system is unracked.
  • - mgmt dns entries removed.

Event Timeline

Just removed all the puppet references of stat1002 and disabled alarms. Please sync with Chris and check https://phabricator.wikimedia.org/T173094 before proceeding any further :)

RobH updated the task description. (Show Details)
RobH moved this task from Backlog to Reclaim (Spares/Decommission) on the hardware-requests board.
Dzahn changed the task status from Open to Stalled.Sep 7 2017, 2:36 PM
Dzahn triaged this task as Medium priority.
Dzahn added a parent task: T173094: Remove stat1002.

Change 378220 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] Remove any trace of stat1003 for decom

https://gerrit.wikimedia.org/r/378220

Change 378220 merged by Elukey:
[operations/puppet@production] Remove any trace of stat1003 for decom

https://gerrit.wikimedia.org/r/378220

elukey closed this task as a duplicate of T173094: Remove stat1002.
elukey updated the task description. (Show Details)

I removed puppet/salt credentials and wiped puppet, but didn't proceed any further since I didn't want to mess with DC-Ops procedure. I was under the impression that even the non interruptible steps could have been done without DC-Ops supervision, but I might have misinterpreted the procedure (in case I am really sorry for that).

Nuria moved this task from Incoming to Radar on the Analytics board.

Change 447503 had a related patch set uploaded (by RobH; owner: RobH):
[operations/dns@master] stat1002 decom prod dns

https://gerrit.wikimedia.org/r/447503

Change 447503 merged by RobH:
[operations/dns@master] stat1002 decom prod dns

https://gerrit.wikimedia.org/r/447503

RobH removed a project: Patch-For-Review.
RobH updated the task description. (Show Details)
RobH moved this task from Backlog to pending onsite steps (eqiad) on the decommission-hardware board.
RobH added a project: ops-eqiad.
RobH moved this task from Backlog to Decommission on the ops-eqiad board.
RobH subscribed.

Change 451178 had a related patch set uploaded (by Cmjohnson; owner: Cmjohnson):
[operations/dns@master] Removing mgmt dns for decom host stat1002

https://gerrit.wikimedia.org/r/451178

Change 451178 merged by Cmjohnson:
[operations/dns@master] Removing mgmt dns for decom host stat1002

https://gerrit.wikimedia.org/r/451178

This server was given to Stroz and we have a copy of the hard drive in the eqiad data center on an encrypted drive, myduo 16TB. IN order to get the password to unlock, we have to contact Stroz.