⚓ T359141 decommission db2117.codfw.wmnet

	Subject	Repo	Branch	Lines +/-
	mariadb: Decommission db2117	operations/puppet	production	+1 -5

Status	Subtype	Assigned	Task
			Unknown Object (Task)
Resolved		Jhancock.wm	T355350 Q#:rack/setup/install db2196-db2220
Resolved		ABran-WMF	T355422 Productionize db2196-db2220
Resolved		ABran-WMF	T358741 Decommission db2096-db2120
Declined	Request	None	T358846 hw troubleshooting: not identified for db2117.codfw.wmnet
Resolved	Request	Jhancock.wm	T359141 decommission db2117.codfw.wmnet

• Marostegui created this task.Mar 5 2024, 8:00 AM

• Marostegui added a parent task: T358846: hw troubleshooting: not identified for db2117.codfw.wmnet.

ABran-WMF claimed this task.Mar 5 2024, 8:32 AM

ABran-WMF moved this task from Triage to Ready on the DBA board.

Change 1008815 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] mariadb: Decommission db2117

https://gerrit.wikimedia.org/r/1008815

Change 1008815 merged by Marostegui:

[operations/puppet@production] mariadb: Decommission db2117

https://gerrit.wikimedia.org/r/1008815

ABran-WMF reassigned this task from ABran-WMF to • Marostegui.Mar 5 2024, 9:11 AM

ABran-WMF subscribed.

Mentioned in SAL (#wikimedia-operations) [2024-03-05T09:12:45Z] <marostegui@cumin1002> dbctl commit (dc=all): 'Remove db2117 T359141', diff saved to https://phabricator.wikimedia.org/P58456 and previous config saved to /var/cache/conftool/dbconfig/20240305-091244-marostegui.json

• Marostegui removed • Marostegui as the assignee of this task.Mar 5 2024, 9:15 AM

• Marostegui updated the task description. (Show Details)

• Marostegui edited projects, added ops-codfw, DC-Ops; removed Patch-For-Review.

• Marostegui added a subscriber: Jhancock.wm.

cookbooks.sre.hosts.decommission executed by marostegui@cumin1002 for hosts: db2117.codfw.wmnet

db2117.codfw.wmnet (FAIL)
- Downtimed host on Icinga/Alertmanager
- Found physical host
- Downtimed management interface on Alertmanager
- Unable to connect to the host, wipe of swraid, partition-table and filesystem signatures will not be performed: Cumin execution failed (exit_code=2)
- Powered off
- [Netbox] Set status to Decommissioning, deleted all non-mgmt IPs, updated switch interfaces (disabled, removed vlans, etc)
- Configured the linked switch interface(s)
- Removed from DebMonitor
- Removed from Puppet master and PuppetDB

ERROR: some step on some host failed, check the bolded items above

@Jhancock.wm please see above - this host was unreachable due to a crash (T358846) so wiping its boot loader wasn't possible from the decommissioning script. I don't know what is required from your end in this situation - just letting you know.

Unable to connect to the host, wipe of swraid, partition-table and filesystem signatures will not be performed: Cumin execution failed (exit_code=2)

@Volans @MoritzMuehlenhoff is anything else required in this situation?
Thanks!

• Marostegui moved this task from Ready to Done on the DBA board.Mar 5 2024, 9:17 AM

• Marostegui moved this task from Backlog to pending onsite steps (codfw) on the decommission-hardware board.

• Marostegui mentioned this in T355422: Productionize db2196-db2220.

• Marostegui mentioned this in T358741: Decommission db2096-db2120.

• Marostegui added a parent task: T358741: Decommission db2096-db2120.

Maintenance_bot added a project: SRE.Mar 5 2024, 9:29 AM

In T359141#9599610, @Marostegui wrote:

@Volans @MoritzMuehlenhoff is anything else required in this situation?

I think that's fine, nothing else to be done here. We wipe the bootloader to prevent service interruptions if the server gets accidentally powered on before it's unracked, but the disks will be crumbled anyway.

Thanks! @Jhancock.wm see above, you can proceed whenever you want.

Jhancock.wm moved this task from Backlog to Decommission on the ops-codfw board.Mar 5 2024, 3:40 PM

Jhancock.wm closed this task as Resolved.Mar 7 2024, 1:25 AM

Jhancock.wm claimed this task.

Jhancock.wm updated the task description. (Show Details)

decommission db2117.codfw.wmnet
Closed, ResolvedPublicRequest
Actions

Description

Details

Related Objects
Search...

Event Timeline

decommission db2117.codfw.wmnetClosed, ResolvedPublicRequestActions

Description

Details

Related ObjectsSearch...

Event Timeline

decommission db2117.codfw.wmnet
Closed, ResolvedPublicRequest
Actions

Related Objects
Search...