Maniphest T339201

decommission analytics1063.eqiad.wmnet
Closed, ResolvedPublicRequest
Actions

Assigned To

Authored By

	Stevemunene
	Jun 15 2023, 9:17 AM

Tags

Referenced Files

None

Subscribers

Details

	Subject	Repo	Branch	Lines +/-
	analytics: Remove analytics106[1-3] from the HDFS topology	operations/puppet	production	+0 -3
	analytics: Decommission analytics106[1-3] from hadoop cluster	operations/puppet	production	+6 -6

Customize query in gerrit

Related Objects
Search...

		Status	Subtype	Assigned	Task
		Resolved		Stevemunene	T317861 Decommission analytics10[58-69]
		Resolved	Request	Jclark-ctr	T339201 decommission analytics1063.eqiad.wmnet

Event Timeline

Stevemunene created this task.Jun 15 2023, 9:17 AM

Stevemunene added a parent task: T317861: Decommission analytics10[58-69].

Change 930580 had a related patch set uploaded (by Stevemunene; author: Stevemunene):

[operations/puppet@production] analytics: Decommission analytics106[1-3] from hadoop cluster

https://gerrit.wikimedia.org/r/930580

Change 930581 had a related patch set uploaded (by Stevemunene; author: Stevemunene):

[operations/puppet@production] analytics: Remove analytics106[1-3] from the HDFS topology

https://gerrit.wikimedia.org/r/930581

Change 930580 merged by Stevemunene:

[operations/puppet@production] analytics: Decommission analytics106[1-3] from hadoop cluster

https://gerrit.wikimedia.org/r/930580

Change 930581 merged by Stevemunene:

[operations/puppet@production] analytics: Remove analytics106[1-3] from the HDFS topology

https://gerrit.wikimedia.org/r/930581

Maintenance_bot removed a project: Patch-For-Review.Jun 22 2023, 2:11 PM

Stevemunene updated the task description. (Show Details)Jun 26 2023, 5:34 AM

Mentioned in SAL (#wikimedia-analytics) [2023-07-06T11:18:36Z] <stevemunene> decommission analytics1063.eqiad.wmnet T339201

cookbooks.sre.hosts.decommission executed by stevemunene@cumin1001 for hosts: analytics1063.eqiad.wmnet

analytics1063.eqiad.wmnet (FAIL)
- Unable to find/resolve the mgmt DNS record, using the IP instead: 10.65.4.101
- Host not found on Icinga, unable to downtime it
- Found physical host
- Management interface not found on Icinga, unable to downtime it
- Unable to connect to the host, wipe of swraid, partition-table and filesystem signatures will not be performed: Cumin execution failed (exit_code=2)
- Host is already powered off
- [Netbox] Set status to Decommissioning, deleted all non-mgmt IPs, updated switch interfaces (disabled, removed vlans, etc)
- Configured the linked switch interface(s)
- Removed from DebMonitor
- Removed from Puppet master and PuppetDB

ERROR: some step on some host failed, check the bolded items above

Wipe of swraid, partition-table and filesystem signatures was performed during the frrst run of the playbook.

Stevemunene moved this task from Backlog to pending onsite steps (eqiad) on the decommission-hardware board.Jul 10 2023, 1:25 PM

Stevemunene updated the task description. (Show Details)Jul 10 2023, 1:27 PM

Stevemunene reassigned this task from Stevemunene to Jclark-ctr.Jul 12 2023, 9:49 AM

Jclark-ctr added a project: ops-eqiad.Jul 12 2023, 1:53 PM

Jclark-ctr moved this task from Backlog to Decommission on the ops-eqiad board.Jul 12 2023, 1:56 PM

BTullis moved this task from Incoming to Needs Reporting on the Data-Platform-SRE board.Jul 17 2023, 8:57 AM

Maintenance_bot added a project: SRE.Jul 17 2023, 9:30 AM

Jclark-ctr closed this task as Resolved.Jul 17 2023, 1:21 PM

Jclark-ctr updated the task description. (Show Details)

Gehel moved this task from Needs Reporting to Done on the Data-Platform-SRE board.Jul 19 2023, 8:52 AM