Page MenuHomePhabricator

decommission logstash100[1-3]
Closed, ResolvedPublic

Description

The new logstash servers are in place, we can start thinking about decommissioning the old ones. Note that we first need to ensure all log producers are migrated to logstash.svc.eqiad.wmnet.

  • - all system services confirmed offline from production use
  • - set all icinga checks to maint mode/disabled while reclaim/decommmission takes place.
  • - remove system from all lvs/pybal active configuration
  • - any service group puppet/hiera/dsh config removed
  • - remove site.pp (replace with role::spare if system isn't shut down immediately during this process.)

START NON-INTERRUPPTABLE STEPS

  • - disable puppet on host
  • - remove all remaining puppet references (include role::spare)
  • - power down host
  • - disable switch port
  • - switch port assignment noted on this task (for later removal)

logstash1001=>asw-a-eqiad:ge-4/0/13
logstash1002=>asw-c-eqiad:ge-4/0/2
logstash1003=>asw2-d-eqiad:ge-3/0/16

  • - remove production dns entries
  • - puppet node clean, puppet node deactivate

END NON-INTERRUPPTABLE STEPS

  • - system disks wiped (by onsite)
  • - system unracked and decommissioned (by onsite), update racktables with result
  • - switch port configration removed from switch once system is unracked.
  • - mgmt dns entries removed.

Event Timeline

debt triaged this task as Medium priority.Sep 14 2017, 5:33 PM

Change 383096 had a related patch set uploaded (by Gehel; owner: Gehel):
[operations/puppet@production] logstash: remove references to old logstash servers for decommissioning

https://gerrit.wikimedia.org/r/383096

Change 383096 merged by Muehlenhoff:
[operations/puppet@production] logstash: remove references to old logstash servers for decommissioning

https://gerrit.wikimedia.org/r/383096

Mentioned in SAL (#wikimedia-operations) [2017-11-02T10:53:31Z] <gehel> depooling logstash100[123] in preparation for decommission - T175830

Change 388026 had a related patch set uploaded (by Gehel; owner: Gehel):
[operations/puppet@production] wdqs: cleanup JVM options for blazegraph

https://gerrit.wikimedia.org/r/388026

Change 395564 had a related patch set uploaded (by Gehel; owner: Gehel):
[operations/puppet@production] logstash: move eventlogging collection to logstash1007

https://gerrit.wikimedia.org/r/395564

Change 395565 had a related patch set uploaded (by Gehel; owner: Gehel):
[operations/puppet@production] logstash: decommission logstash100[1-3]

https://gerrit.wikimedia.org/r/395565

Change 395564 merged by Gehel:
[operations/puppet@production] logstash: move eventlogging collection to logstash1007

https://gerrit.wikimedia.org/r/395564

Mentioned in SAL (#wikimedia-operations) [2017-12-05T19:20:18Z] <gehel> moving eventlogging collection by logstash from logstash1003 to logstash1007, no messages should be lost - T175830

Mentioned in SAL (#wikimedia-operations) [2017-12-06T09:34:41Z] <gehel> shuttting down logstash / elasticsearch on logstash100[123] in preparation for decommission -T175830

Change 395565 merged by Gehel:
[operations/puppet@production] logstash: decommission logstash100[1-3]

https://gerrit.wikimedia.org/r/395565

Gehel updated the task description. (Show Details)
Gehel added a subscriber: RobH.

My steps for decommissioning are done (see checklist in the task description). Assigning to @RobH to continue.

Change 409433 had a related patch set uploaded (by RobH; owner: RobH):
[operations/puppet@production] logstash100[1-3] decommission

https://gerrit.wikimedia.org/r/409433

Change 409433 merged by RobH:
[operations/puppet@production] logstash100[1-3] decommission

https://gerrit.wikimedia.org/r/409433

Change 409434 had a related patch set uploaded (by RobH; owner: RobH):
[operations/dns@master] decom logstash100[1-3] prod dns

https://gerrit.wikimedia.org/r/409434

Change 409434 merged by RobH:
[operations/dns@master] decom logstash100[1-3] prod dns

https://gerrit.wikimedia.org/r/409434

RobH removed a project: Patch-For-Review.
RobH updated the task description. (Show Details)
RobH added a project: ops-eqiad.
RobH moved this task from Backlog to Decommission on the ops-eqiad board.

ready for on-site wipe and unracking steps

Change 421570 had a related patch set uploaded (by Cmjohnson; owner: Cmjohnson):
[operations/dns@master] Removing mgmt dns logstash1001-1003

https://gerrit.wikimedia.org/r/421570

Change 421570 merged by Cmjohnson:
[operations/dns@master] Removing mgmt dns logstash1001-1003

https://gerrit.wikimedia.org/r/421570

Cmjohnson updated the task description. (Show Details)