⚓ T217556 Decommission old eqiad logstash hardware hosts logstash100[456]

Subject	Repo	Branch	Lines +/-
Removing mgmt dns entries for logstash1000[4-6]	operations/dns	master	+0 -12
logstash: remove logstash1004,1005,1006 from Hiera	operations/puppet	production	+0 -12
decom logstash100[456] prod dns	operations/dns	master	+3 -6
logstash100[456] decommission	operations/puppet	production	+1 -26

Status	Assigned	Task
Resolved	fgiunchedi	T213157 Increase utilization of application logging pipeline (FY2018-2019 Q3 TEC6)
Resolved	herron	T213898 Replace and expand Elasticsearch/Kafka storage in eqiad and upgrade the cluster from Debian jessie to stretch
Resolved	Jclark-ctr	T217556 Decommission old eqiad logstash hardware hosts logstash100[456]

herron triaged this task as Medium priority.Mar 4 2019, 2:56 PM

herron created this task.

herron mentioned this in T213898: Replace and expand Elasticsearch/Kafka storage in eqiad and upgrade the cluster from Debian jessie to stretch.Mar 4 2019, 2:59 PM

Peachey88 added a project: decommission-hardware.Mar 4 2019, 9:27 PM

RobH moved this task from Backlog to Ready for Decommission on the decommission-hardware board.Mar 7 2019, 9:04 PM

Decision on reclaim or decommission: These hosts were purchased on April 13, 2015, and support expired in April 2018. The systems are just shy of 4 years old. At 5 years, we would simply decommission when they go spare. A decision will need to be attached to this task from @faidon or @mark in regards to weather to unrack these and dispose of them, or reclaim them to out of warranty spares.

RobH moved this task from Backlog to Decommission on the ops-eqiad board.Mar 7 2019, 9:41 PM

RobH updated the task description. (Show Details)

Chatted with @faidon about this over IRC, we can dispose of these rather than reclaim to spares. So they'll get added to the decom tracking sheets and unracked.

RobH updated the task description. (Show Details)Mar 7 2019, 9:46 PM

wmf-decommission-host was executed by robh for logstash1004.eqiad.wmnet and performed the following actions:

Revoked Puppet certificate
Removed from PuppetDB
Downtimed host on Icinga
Downtimed mgmt interface on Icinga
Removed from DebMonitor

wmf-decommission-host was executed by robh for logstash1005.eqiad.wmnet and performed the following actions:

Revoked Puppet certificate
Removed from PuppetDB
Downtimed host on Icinga
Downtimed mgmt interface on Icinga
Removed from DebMonitor

wmf-decommission-host was executed by robh for logstash1006.eqiad.wmnet and performed the following actions:

Revoked Puppet certificate
Removed from PuppetDB
Downtimed host on Icinga
Downtimed mgmt interface on Icinga
Removed from DebMonitor

Change 495142 had a related patch set uploaded (by RobH; owner: RobH):
[operations/puppet@production] logstash100[456] decommission

https://gerrit.wikimedia.org/r/495142

Change 495143 had a related patch set uploaded (by RobH; owner: RobH):
[operations/dns@master] decom logstash100[456] prod dns

https://gerrit.wikimedia.org/r/495143

Change 495143 merged by RobH:
[operations/dns@master] decom logstash100[456] prod dns

https://gerrit.wikimedia.org/r/495143

Change 495142 merged by RobH:
[operations/puppet@production] logstash100[456] decommission

https://gerrit.wikimedia.org/r/495142

RobH reassigned this task from RobH to • Cmjohnson.Mar 7 2019, 10:09 PM

RobH removed a project: Patch-For-Review.

RobH updated the task description. (Show Details)

RobH moved this task from Ready for Decommission to pending onsite steps (eqiad) on the decommission-hardware board.

BEWARE. These hosts have not been removed from all places in puppet yet, though they are already gone from DNS. This caused issues on all logstash hosts today, because when the ferm rules were reloaded by puppet due to an unrelated change, ferm failed to restart because it could not lookup logstash1004 in DNS anymore.

~/puppet$ grep -r logstash1004 *
hieradata/role/common/logstash.yaml:      - logstash1004.eqiad.wmnet
hieradata/role/common/logstash.yaml:      - logstash1004.eqiad.wmnet
hieradata/role/common/logstash/elasticsearch.yaml:      - logstash1004.eqiad.wmnet
hieradata/role/common/logstash/elasticsearch.yaml:      - logstash1004.eqiad.wmnet

Change 499433 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] logstash: remove logstash1004,1005,1006 from Hiera

https://gerrit.wikimedia.org/r/499433

gerritbot added a project: Patch-For-Review.Mar 27 2019, 10:08 AM

Dzahn reassigned this task from • Cmjohnson to herron.Mar 27 2019, 10:17 AM

Dzahn added a subscriber: • Cmjohnson.

Change 499433 merged by Dzahn:
[operations/puppet@production] logstash: remove logstash1004,1005,1006 from Hiera

https://gerrit.wikimedia.org/r/499433

2019-03-27

    11:04 mutante: re-enabled puppet on logstash1007 through 1011 - then on logstash*
    10:53 mutante: enabling and running puppet on logstash1007
    10:49 mutante: disabling puppet on logstash* via cumin

06:54 <+icinga-wm> RECOVERY - Check systemd state on logstash1007 is OK: OK - running: The system is fully operational
07:00 <+icinga-wm> RECOVERY - Check systemd state on logstash1008 is OK: OK - running: The system is fully operational
07:00 <+icinga-wm> RECOVERY - Check systemd state on logstash1009 is OK: OK - running: The system is fully operational
07:02 <+icinga-wm> RECOVERY - Check systemd state on logstash1010 is OK: OK - running: The system is fully operational
07:04 <+icinga-wm> RECOVERY - Check systemd state on logstash1011 is OK: OK - running: The system is fully operational

now it should be ok to continue. at least i don't see the hosts in puppet repo anymore and the issue on logstash has been resolved

Maintenance_bot removed a project: Patch-For-Review.May 22 2019, 3:36 PM

wipe is running on all 4 internal disks for T217556 and on the external usb disk for T212457.

RobH mentioned this in Unknown Object (Task).Jul 24 2019, 7:16 PM

@Jclark-ctr Please wipe logstash1004 and 1005 and then remove from rack and update netbox and the google tracking sheet.
https://docs.google.com/spreadsheets/d/1JhjeV3cXfIzIyekJrnA2nNFFDGTT4SeLmyAFvDa4HmA/edit#gid=2026042311

• Cmjohnson updated the task description. (Show Details)Aug 8 2019, 3:07 PM

Dzahn unsubscribed.Aug 8 2019, 10:23 PM

Jclark-ctr updated the task description. (Show Details)Aug 9 2019, 9:34 PM

fgiunchedi added a project: observability.Aug 19 2019, 2:30 PM

@Jclark-ctr has this ben done? We need the space in rack B2 so please make this a priority item. Thanks!

@Cmjohnson Finished wiping i will be removing from rack shortly

Change 531293 had a related patch set uploaded (by Cmjohnson; owner: Cmjohnson):
[operations/dns@master] Removing mgmt dns entries for logstash1000[4-6]

https://gerrit.wikimedia.org/r/531293

gerritbot added a project: Patch-For-Review.Aug 20 2019, 6:47 PM

Change 531293 merged by Cmjohnson:
[operations/dns@master] Removing mgmt dns entries for logstash1000[4-6]

https://gerrit.wikimedia.org/r/531293

Maintenance_bot removed a project: Patch-For-Review.Aug 21 2019, 6:10 PM

Jclark-ctr reassigned this task from Jclark-ctr to • Cmjohnson.Aug 22 2019, 7:49 PM

Jclark-ctr updated the task description. (Show Details)

Jclark-ctr claimed this task.Oct 11 2019, 10:32 PM

Jclark-ctr updated the task description. (Show Details)Oct 11 2019, 11:06 PM

fgiunchedi moved this task from Inbox to Radar on the observability board.Dec 9 2019, 11:35 AM

RobH unsubscribed.Mar 3 2020, 6:01 PM

RobH removed a project: DC-Ops.Apr 1 2020, 5:07 PM