Decommission analytics10[28-31,33-41]
Closed, ResolvedPublic0 Estimated Story Points
Actions

Assigned To

Authored By

	elukey
	Jul 8 2019, 2:33 PM

Details

	Subject	Repo	Branch	Lines +/-
	Removing dns entries for analytics1028-31,33-41	operations/dns	master	+0 -39
	Decommission Hadoop test cluster	operations/puppet	production	+40 -165

Customize query in gerrit

Related Objects
Search...

Status	Assigned	Task
Resolved	None	T244211 Analytics Hardware for Fiscal Year 2019/2020
Resolved	Ottomata	T243521 Hadoop Hardware Orders FY2019-2020
Resolved	elukey	T255139 Create the new Hadoop test cluster
		Restricted Task
Resolved	elukey	T211836 Enable Security (stronger authentication and data encryption) for the Analytics Hadoop cluster and its dependent services
Resolved	• Cmjohnson	T227485 Decommission analytics10[28-31,33-41]
Resolved	• Cmjohnson	T233080 Decommission analytics1032

Event Timeline

Not actionable yet.

RobH assigned this task to elukey.Jul 15 2019, 6:09 PM

RobH moved this task from Backlog to Blocked on Service Owners on the decommission-hardware board.

elukey renamed this task from Decommission analytics10[28-41] to Decommission analytics10[28-31,33-41].Sep 17 2019, 8:03 AM

MoritzMuehlenhoff moved this task from Blocked on Service Owners to Ready for Decommission on the decommission-hardware board.Sep 27 2019, 10:43 AM

MoritzMuehlenhoff moved this task from Ready for Decommission to Blocked on Service Owners on the decommission-hardware board.

RobH unsubscribed.Mar 3 2020, 6:01 PM

RobH added a project: ops-eqiad.Apr 1 2020, 5:31 PM

RobH updated the task description. (Show Details)

RobH moved this task from Backlog to Decommission on the ops-eqiad board.Apr 1 2020, 5:51 PM

• Cmjohnson closed subtask T233080: Decommission analytics1032 as Resolved.May 21 2020, 9:49 PM

Is this still stalled nowadays?

elukey added a project: Analytics-Clusters.Jun 16 2020, 8:48 AM

Restricted Application added a project: Analytics. · View Herald TranscriptJun 16 2020, 8:48 AM

elukey moved this task from Backlog to Q1 2020/2021 on the Analytics-Clusters board.Jun 16 2020, 9:30 AM

• fdans moved this task from Incoming to Operational Excellence on the Analytics board.Jun 18 2020, 4:17 PM

Dzahn unsubscribed.Jul 1 2020, 4:55 PM

Aklapper removed a project: Analytics.Jul 4 2020, 7:59 AM

elukey added a parent task: T255139: Create the new Hadoop test cluster.Aug 14 2020, 9:43 AM

wiki_willy mentioned this in T245161: Track down and replace very old HW.Aug 18 2020, 3:50 PM

Updating this task - we are setting up the new hadoop test cluster, once done I'll clear all puppet config and set this task as actionable.

I'm removing the ops-eqiad tag, as this is hurting their open task metrics when its never actually been within their ability to move this forward.

When this is ready for movement by ops-eqiad, add the tag back.

Change 630541 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] Decommission Hadoop test cluster

https://gerrit.wikimedia.org/r/630541

gerritbot added a project: Patch-For-Review.Sep 28 2020, 8:28 AM

Change 630541 merged by Elukey:
[operations/puppet@production] Decommission Hadoop test cluster

https://gerrit.wikimedia.org/r/630541

elukey added a project: ops-eqiad.Sep 28 2020, 8:40 AM

cookbooks.sre.hosts.decommission executed by elukey@cumin1001 for hosts: analytics[1028-1029].eqiad.wmnet

analytics1028.eqiad.wmnet (PASS)
- Downtimed host on Icinga
- Found physical host
- Downtimed management interface on Icinga
- Wiped bootloaders
- Powered off
- Set Netbox status to Decommissioning and deleted all non-mgmt interfaces and related IPs
- Removed from DebMonitor
- Removed from Puppet master and PuppetDB

analytics1029.eqiad.wmnet (PASS)
- Downtimed host on Icinga
- Found physical host
- Downtimed management interface on Icinga
- Wiped bootloaders
- Powered off
- Set Netbox status to Decommissioning and deleted all non-mgmt interfaces and related IPs
- Removed from DebMonitor
- Removed from Puppet master and PuppetDB

COMMON_STEPS (WARN)
- Not all affected DC(s) have been migrated to automatic DNS, a manual patch to the operations/dns repository is required

cookbooks.sre.hosts.decommission executed by elukey@cumin1001 for hosts: analytics[1030-1031,1033-1039].eqiad.wmnet

analytics1030.eqiad.wmnet (PASS)
- Downtimed host on Icinga
- Found physical host
- Downtimed management interface on Icinga
- Wiped bootloaders
- Powered off
- Set Netbox status to Decommissioning and deleted all non-mgmt interfaces and related IPs
- Removed from DebMonitor
- Removed from Puppet master and PuppetDB

analytics1031.eqiad.wmnet (PASS)
- Downtimed host on Icinga
- Found physical host
- Downtimed management interface on Icinga
- Wiped bootloaders
- Powered off
- Set Netbox status to Decommissioning and deleted all non-mgmt interfaces and related IPs
- Removed from DebMonitor
- Removed from Puppet master and PuppetDB

analytics1033.eqiad.wmnet (PASS)
- Downtimed host on Icinga
- Found physical host
- Downtimed management interface on Icinga
- Wiped bootloaders
- Powered off
- Set Netbox status to Decommissioning and deleted all non-mgmt interfaces and related IPs
- Removed from DebMonitor
- Removed from Puppet master and PuppetDB

analytics1034.eqiad.wmnet (PASS)
- Downtimed host on Icinga
- Found physical host
- Downtimed management interface on Icinga
- Wiped bootloaders
- Powered off
- Set Netbox status to Decommissioning and deleted all non-mgmt interfaces and related IPs
- Removed from DebMonitor
- Removed from Puppet master and PuppetDB

analytics1035.eqiad.wmnet (PASS)
- Downtimed host on Icinga
- Found physical host
- Downtimed management interface on Icinga
- Wiped bootloaders
- Powered off
- Set Netbox status to Decommissioning and deleted all non-mgmt interfaces and related IPs
- Removed from DebMonitor
- Removed from Puppet master and PuppetDB

analytics1036.eqiad.wmnet (PASS)
- Downtimed host on Icinga
- Found physical host
- Downtimed management interface on Icinga
- Wiped bootloaders
- Powered off
- Set Netbox status to Decommissioning and deleted all non-mgmt interfaces and related IPs
- Removed from DebMonitor
- Removed from Puppet master and PuppetDB

analytics1037.eqiad.wmnet (PASS)
- Downtimed host on Icinga
- Found physical host
- Downtimed management interface on Icinga
- Wiped bootloaders
- Powered off
- Set Netbox status to Decommissioning and deleted all non-mgmt interfaces and related IPs
- Removed from DebMonitor
- Removed from Puppet master and PuppetDB

analytics1038.eqiad.wmnet (FAIL)
- Downtimed host on Icinga
- Found physical host
- Downtimed management interface on Icinga
- Failed to wipe bootloaders, manual intervention required to make it unbootable: Cumin execution failed (exit_code=2)
- Powered off
- Set Netbox status to Decommissioning and deleted all non-mgmt interfaces and related IPs
- Removed from DebMonitor
- Removed from Puppet master and PuppetDB

analytics1039.eqiad.wmnet (PASS)
- Downtimed host on Icinga
- Found physical host
- Downtimed management interface on Icinga
- Wiped bootloaders
- Powered off
- Set Netbox status to Decommissioning and deleted all non-mgmt interfaces and related IPs
- Removed from DebMonitor
- Removed from Puppet master and PuppetDB

COMMON_STEPS (WARN)
- Not all affected DC(s) have been migrated to automatic DNS, a manual patch to the operations/dns repository is required

ERROR: some step on some host failed, check the bolded items above

cookbooks.sre.hosts.decommission executed by elukey@cumin1001 for hosts: analytics[1040-1041].eqiad.wmnet

analytics1040.eqiad.wmnet (PASS)
- Downtimed host on Icinga
- Found physical host
- Downtimed management interface on Icinga
- Wiped bootloaders
- Powered off
- Set Netbox status to Decommissioning and deleted all non-mgmt interfaces and related IPs
- Removed from DebMonitor
- Removed from Puppet master and PuppetDB

analytics1041.eqiad.wmnet (PASS)
- Downtimed host on Icinga
- Found physical host
- Downtimed management interface on Icinga
- Wiped bootloaders
- Powered off
- Set Netbox status to Decommissioning and deleted all non-mgmt interfaces and related IPs
- Removed from DebMonitor
- Removed from Puppet master and PuppetDB

COMMON_STEPS (WARN)
- Not all affected DC(s) have been migrated to automatic DNS, a manual patch to the operations/dns repository is required

Maintenance_bot removed a project: Patch-For-Review.Sep 28 2020, 9:10 AM

elukey changed the task status from Stalled to Open.Sep 28 2020, 9:17 AM

elukey reassigned this task from elukey to • Cmjohnson.

elukey updated the task description. (Show Details)

The hosts here are showing up in a weird state. When running the DNS cookbook you get warnings that these hosts exist but are not "in devices". Though running the cookbook above should have cleaned that up.

elukey removed a project: Analytics-Clusters.Oct 6 2020, 12:38 PM

• Cmjohnson updated the task description. (Show Details)Oct 7 2020, 2:52 PM

I ran the script in netbox to remove all of these hosts and then ran the cookbook in cumin, removed all from the racks, cleaned up the network switch and disabled ports.

Change 632736 had a related patch set uploaded (by Cmjohnson; owner: Cmjohnson):
[operations/dns@master] Removing dns entries for analytics1028-31,33-41

https://gerrit.wikimedia.org/r/632736

Change 632736 merged by Cmjohnson:
[operations/dns@master] Removing dns entries for analytics1028-31,33-41

https://gerrit.wikimedia.org/r/632736

All the dns records have been manually removed as well

Maintenance_bot removed a project: Patch-For-Review.Oct 7 2020, 4:11 PM