⚓ T289657 Decommission mc[1019-1023,1025-1026,1028-1036].eqiad.wmnet

	Subject	Repo	Branch	Lines +/-
	Decommission old eqiad memcached hosts	operations/puppet	production	+0 -313
	Decommission mc1019, mc1020, mc1033, mc1034	operations/puppet	production	+2 -246

I will update description when I have performed the service owner actions

Maintenance_bot added a project: SRE.Aug 25 2021, 8:45 AM

• Cmjohnson moved this task from Backlog to Decommission on the ops-eqiad board.Aug 25 2021, 5:21 PM

• Cmjohnson claimed this task.Aug 25 2021, 6:14 PM

Hi @jijiki - hope all is well. We were wondering if it would be possible to prioritize the decom of mc1033 and 1034? It would help us with installing T285808 for @fgiunchedi's ms-be hosts. Thanks, Willy

wiki_willy mentioned this in T285808: Q1:(Need By: ASAP) rack/setup/install ms-be10[64-67].Aug 25 2021, 6:28 PM

• Cmjohnson reassigned this task from • Cmjohnson to jijiki.Aug 25 2021, 6:45 PM

• Cmjohnson subscribed.

In T289657#7309715, @wiki_willy wrote:

Hi @jijiki - hope all is well. We were wondering if it would be possible to prioritize the decom of mc1033 and 1034? It would help us with installing T285808 for @fgiunchedi's ms-be hosts. Thanks, Willy

I will do so, sorry for this task slipped through the cracks a bit.

Awesome, thanks @jijiki!

jijiki renamed this task from Decommission mc[1019-1023,1025-1026,1028-1036].eqiad.wmnet (WIP) to Decommission mc[1019-1023,1025-1026,1028-1036].eqiad.wmnet.Sep 1 2021, 7:01 PM

jijiki updated the task description. (Show Details)

cookbooks.sre.hosts.decommission executed by jiji@cumin1001 for hosts: mc1034.eqiad.wmnet

mc1034.eqiad.wmnet (PASS)
- Downtimed host on Icinga
- Found physical host
- Downtimed management interface on Icinga
- Wiped all swraid, partition-table and filesystem signatures
- Powered off
- Set Netbox status to Decommissioning and deleted all non-mgmt interfaces and related IPs
- Removed from DebMonitor
- Removed from Puppet master and PuppetDB

cookbooks.sre.hosts.decommission executed by jiji@cumin1001 for hosts: mc1019.eqiad.wmnet

mc1019.eqiad.wmnet (PASS)
- Downtimed host on Icinga
- Found physical host
- Downtimed management interface on Icinga
- Wiped all swraid, partition-table and filesystem signatures
- Powered off
- Set Netbox status to Decommissioning and deleted all non-mgmt interfaces and related IPs
- Removed from DebMonitor
- Removed from Puppet master and PuppetDB

cookbooks.sre.hosts.decommission executed by jiji@cumin1001 for hosts: mc1020.eqiad.wmnet

mc1020.eqiad.wmnet (PASS)
- Downtimed host on Icinga
- Found physical host
- Downtimed management interface on Icinga
- Wiped all swraid, partition-table and filesystem signatures
- Powered off
- Set Netbox status to Decommissioning and deleted all non-mgmt interfaces and related IPs
- Removed from DebMonitor
- Removed from Puppet master and PuppetDB

Change 716413 had a related patch set uploaded (by Effie Mouzeli; author: Effie Mouzeli):

[operations/puppet@production] Decommission mc1019, mc1020, mc1033, mc1034

https://gerrit.wikimedia.org/r/716413

gerritbot added a project: Patch-For-Review.Sep 2 2021, 4:29 PM

@wiki_willy you can remove 1033 and 1034

Awesome, thanks so much @jijiki. (fyi for @Cmjohnson and @Jclark-ctr)

>>! In T289657#7328872, @jijiki wrote:

@wiki_willy you can remove 1033 and 1034

Change 716413 abandoned by Effie Mouzeli:

[operations/puppet@production] Decommission mc1019, mc1020, mc1033, mc1034

Reason:

rebase hell

https://gerrit.wikimedia.org/r/716413

Change 716432 had a related patch set uploaded (by Effie Mouzeli; author: Effie Mouzeli):

[operations/puppet@production] Decommission mc1019, mc1020, mc1033, mc1034

https://gerrit.wikimedia.org/r/716432

mc1019, 1020 and mc1033 and 1034 have been removed from the rack.

cookbooks.sre.hosts.decommission executed by jiji@cumin1001 for hosts: mc1021.eqiad.wmnet

mc1021.eqiad.wmnet (PASS)
- Downtimed host on Icinga
- Found physical host
- Downtimed management interface on Icinga
- Wiped all swraid, partition-table and filesystem signatures
- Powered off
- Set Netbox status to Decommissioning and deleted all non-mgmt interfaces and related IPs
- Removed from DebMonitor
- Removed from Puppet master and PuppetDB

COMMON_STEPS (FAIL)
- Failed to run the sre.dns.netbox cookbook: Cumin execution failed (exit_code=2)

ERROR: some step on some host failed, check the bolded items above

cookbooks.sre.hosts.decommission executed by jiji@cumin1001 for hosts: mc1022.eqiad.wmnet

mc1022.eqiad.wmnet (PASS)
- Downtimed host on Icinga
- Found physical host
- Downtimed management interface on Icinga
- Wiped all swraid, partition-table and filesystem signatures
- Powered off
- Set Netbox status to Decommissioning and deleted all non-mgmt interfaces and related IPs
- Removed from DebMonitor
- Removed from Puppet master and PuppetDB

jijiki updated the task description. (Show Details)Sep 3 2021, 9:22 AM

cookbooks.sre.hosts.decommission executed by jiji@cumin1001 for hosts: mc[1025-1026].eqiad.wmnet

mc1025.eqiad.wmnet (FAIL)
- Downtimed host on Icinga
- Found physical host
- Downtimed management interface on Icinga
- Wiped all swraid, partition-table and filesystem signatures
- Failed to power off, manual intervention required: Remote IPMI for mc1025.mgmt.eqiad.wmnet failed (exit=1): b''
- Set Netbox status to Decommissioning and deleted all non-mgmt interfaces and related IPs
- Removed from DebMonitor
- Removed from Puppet master and PuppetDB

mc1026.eqiad.wmnet (FAIL)
- Downtimed host on Icinga
- Found physical host
- Downtimed management interface on Icinga
- Wiped all swraid, partition-table and filesystem signatures
- Failed to power off, manual intervention required: Remote IPMI for mc1026.mgmt.eqiad.wmnet failed (exit=1): b''
- Set Netbox status to Decommissioning and deleted all non-mgmt interfaces and related IPs
- Removed from DebMonitor
- Removed from Puppet master and PuppetDB

ERROR: some step on some host failed, check the bolded items above

cookbooks.sre.hosts.decommission executed by jiji@cumin1001 for hosts: mc[1028-1032].eqiad.wmnet

mc1028.eqiad.wmnet (PASS)
- Downtimed host on Icinga
- Found physical host
- Downtimed management interface on Icinga
- Wiped all swraid, partition-table and filesystem signatures
- Powered off
- Set Netbox status to Decommissioning and deleted all non-mgmt interfaces and related IPs
- Removed from DebMonitor
- Removed from Puppet master and PuppetDB

mc1029.eqiad.wmnet (PASS)
- Downtimed host on Icinga
- Found physical host
- Downtimed management interface on Icinga
- Wiped all swraid, partition-table and filesystem signatures
- Powered off
- Set Netbox status to Decommissioning and deleted all non-mgmt interfaces and related IPs
- Removed from DebMonitor
- Removed from Puppet master and PuppetDB

mc1030.eqiad.wmnet (PASS)
- Downtimed host on Icinga
- Found physical host
- Downtimed management interface on Icinga
- Wiped all swraid, partition-table and filesystem signatures
- Powered off
- Set Netbox status to Decommissioning and deleted all non-mgmt interfaces and related IPs
- Removed from DebMonitor
- Removed from Puppet master and PuppetDB

mc1031.eqiad.wmnet (PASS)
- Downtimed host on Icinga
- Found physical host
- Downtimed management interface on Icinga
- Wiped all swraid, partition-table and filesystem signatures
- Powered off
- Set Netbox status to Decommissioning and deleted all non-mgmt interfaces and related IPs
- Removed from DebMonitor
- Removed from Puppet master and PuppetDB

mc1032.eqiad.wmnet (PASS)
- Downtimed host on Icinga
- Found physical host
- Downtimed management interface on Icinga
- Wiped all swraid, partition-table and filesystem signatures
- Powered off
- Set Netbox status to Decommissioning and deleted all non-mgmt interfaces and related IPs
- Removed from DebMonitor
- Removed from Puppet master and PuppetDB

cookbooks.sre.hosts.decommission executed by jiji@cumin1001 for hosts: mc[1035-1036].eqiad.wmnet

mc1035.eqiad.wmnet (PASS)
- Downtimed host on Icinga
- Found physical host
- Downtimed management interface on Icinga
- Wiped all swraid, partition-table and filesystem signatures
- Powered off
- Set Netbox status to Decommissioning and deleted all non-mgmt interfaces and related IPs
- Removed from DebMonitor
- Removed from Puppet master and PuppetDB

mc1036.eqiad.wmnet (PASS)
- Downtimed host on Icinga
- Found physical host
- Downtimed management interface on Icinga
- Wiped all swraid, partition-table and filesystem signatures
- Powered off
- Set Netbox status to Decommissioning and deleted all non-mgmt interfaces and related IPs
- Removed from DebMonitor
- Removed from Puppet master and PuppetDB

jijiki updated the task description. (Show Details)Sep 3 2021, 12:47 PM

cookbooks.sre.hosts.decommission executed by jiji@cumin1001 for hosts: mc1023.eqiad.wmnet

mc1023.eqiad.wmnet (PASS)
- Downtimed host on Icinga
- Found physical host
- Downtimed management interface on Icinga
- Wiped all swraid, partition-table and filesystem signatures
- Powered off
- Set Netbox status to Decommissioning and deleted all non-mgmt interfaces and related IPs
- Removed from DebMonitor
- Removed from Puppet master and PuppetDB

cookbooks.sre.hosts.decommission executed by jiji@cumin1001 for hosts: mc1027.eqiad.wmnet

mc1027.eqiad.wmnet (FAIL)
- Host steps raised exception: Host mc1027 was not found in Icinga status - no hosts have been downtimed.

ERROR: some step on some host failed, check the bolded items above

jijiki updated the task description. (Show Details)Sep 3 2021, 1:18 PM

jijiki reassigned this task from jijiki to • Cmjohnson.Sep 3 2021, 1:20 PM

jijiki updated the task description. (Show Details)

cookbooks.sre.hosts.decommission executed by volans@cumin1001 for hosts: mc1027.eqiad.wmnet

mc1027.eqiad.wmnet (FAIL)
- Host steps raised exception: Host mc1027 was not found in Icinga status - no hosts have been downtimed.

ERROR: some step on some host failed, check the bolded items above

cookbooks.sre.hosts.decommission executed by volans@cumin1001 for hosts: mc1027.eqiad.wmnet

mc1027.eqiad.wmnet (FAIL)
- Host not found on Icinga, unable to downtme it
- Found physical host
- Management interface not found on Icinga, unable to downtme it
- Unable to connect to the host, wipe of swraid, partition-table and filesystem signatures will not be performed: Cumin execution failed (exit_code=2)
- Powered off
- Set Netbox status to Decommissioning and deleted all non-mgmt interfaces and related IPs
- Removed from DebMonitor
- Removed from Puppet master and PuppetDB

ERROR: some step on some host failed, check the bolded items above

@Cmjohnson Can we please unrack mc1026 last? We are trying fix a bug in the decom process. Thank you!

cookbooks.sre.hosts.decommission executed by volans@cumin2002 for hosts: mc1027.eqiad.wmnet

mc1027.eqiad.wmnet (FAIL)
- Host not found on Icinga, unable to downtme it
- Found physical host
- Management interface not found on Icinga, unable to downtme it
- Unable to connect to the host, wipe of swraid, partition-table and filesystem signatures will not be performed: Cumin execution failed (exit_code=2)
- **No DNS record found for the mgmt interface mc1027.mgmt.eqiad.wmnet, trying the asset tag one: wmf6960.mgmt.eqiad.wmnet
- Powered off
- Set Netbox status to Decommissioning and deleted all non-mgmt interfaces and related IPs
- Removed from DebMonitor
- Removed from Puppet master and PuppetDB

ERROR: some step on some host failed, check the bolded items above

@Cmjohnson You can now remove any of the remaining hosts any given time, thank you!

Change 716432 merged by Effie Mouzeli:

[operations/puppet@production] Decommission old eqiad memcached hosts

https://gerrit.wikimedia.org/r/716432

Maintenance_bot removed a project: Patch-For-Review.Sep 9 2021, 1:10 PM

removed from rack and updated netbox

Change 902127 had a related patch set uploaded (by JHathaway; author: JHathaway):

[operations/puppet@production] Revert "Add a dborch vm for testing the bullseye upgrade"

https://gerrit.wikimedia.org/r/902127

gerritbot added a project: Patch-For-Review.Mar 22 2023, 5:06 PM

cookbooks.sre.hosts.decommission executed by jhathaway@cumin1001 for hosts: dborch1002.wikimedia.org

dborch1002.wikimedia.org (PASS)
- Downtimed host on Icinga/Alertmanager
- Found Ganeti VM
- VM shutdown
- Started forced sync of VMs in Ganeti cluster eqiad to Netbox
- Removed from DebMonitor
- Removed from Puppet master and PuppetDB
- VM removed
- Started forced sync of VMs in Ganeti cluster eqiad to Netbox

Decommission mc[1019-1023,1025-1026,1028-1036].eqiad.wmnet
Closed, ResolvedPublicRequest
Actions

Description

mc1019

mc1020

mc1021

mc1022

mc1023

mc1025

mc1026

mc1028

mc1029

mc1030

mc1031

mc1032

mc1033

mc1034

mc1035

mc1036

Details

Related Objects

Event Timeline

	jijiki
	Aug 25 2021, 8:40 AM

Decommission mc[1019-1023,1025-1026,1028-1036].eqiad.wmnetClosed, ResolvedPublicRequestActions

Description

mc1019

mc1020

mc1021

mc1022

mc1023

mc1025

mc1026

mc1028

mc1029

mc1030

mc1031

mc1032

mc1033

mc1034

mc1035

mc1036

Details

Related Objects

Event Timeline

Decommission mc[1019-1023,1025-1026,1028-1036].eqiad.wmnet
Closed, ResolvedPublicRequest
Actions