⚓ T335585 Decommission prometheus4001

	Subject	Repo	Branch	Lines +/-
	prometheus: Decommission prometheus4001 in ulsfo	operations/puppet	production	+0 -7

Status	Subtype	Assigned	Task
Resolved		andrea.denisse	T324725 Observability Bullseye upgrades
Resolved		andrea.denisse	T309979 Upgrade Prometheus VMs in PoPs to Bullseye
Resolved	Request	RobH	T335585 Decommission prometheus4001

andrea.denisse renamed this task from decommission prometheus4001 to Decommission prometheus4001.Apr 28 2023, 2:47 PM

andrea.denisse changed the task status from Open to In Progress.

andrea.denisse created this task.

andrea.denisse added a parent task: T309979: Upgrade Prometheus VMs in PoPs to Bullseye.

Change 913250 had a related patch set uploaded (by Andrea Denisse; author: Andrea Denisse):

[operations/puppet@production] prometheus: Decommission prometheus4001 in ulsfo

https://gerrit.wikimedia.org/r/913250

gerritbot added a project: Patch-For-Review.Apr 28 2023, 6:50 PM

Change 913250 merged by Andrea Denisse:

[operations/puppet@production] prometheus: Decommission prometheus4001 in ulsfo

https://gerrit.wikimedia.org/r/913250

Maintenance_bot removed a project: Patch-For-Review.May 11 2023, 8:30 PM

cookbooks.sre.hosts.decommission executed by denisse@cumin1001 for hosts: prometheus4001.ulsfo.wment

prometheus4001.ulsfo.wment (FAIL)
- Downtimed host on Icinga/Alertmanager
- Found Ganeti VM
- Failed to shutdown VM, manually run gnt-instance remove on the Ganeti master for the ulsfo cluster: Cumin execution failed (exit_code=2)
- Started forced sync of VMs in Ganeti cluster ulsfo to Netbox
- Removed from DebMonitor
- Removed from Puppet master and PuppetDB
- Failed to remove VM, manually run gnt-instance remove on the Ganeti master for the ulsfo cluster: Cumin execution failed (exit_code=2)
- Started forced sync of VMs in Ganeti cluster ulsfo to Netbox

ERROR: some step on some host failed, check the bolded items above

Mentioned in SAL (#wikimedia-operations) [2023-05-12T00:32:21Z] <denisse> manually removing prometheus4001.ulsfo.wmnet from the Ganeti master after a failed step in the decommission cookbook - T335585

andrea.denisse changed the task status from In Progress to Open.May 12 2023, 12:39 AM

andrea.denisse removed andrea.denisse as the assignee of this task.

andrea.denisse updated the task description. (Show Details)

andrea.denisse added projects: DC-Ops, ops-ulsfo.

Maintenance_bot added a project: SRE.May 12 2023, 12:45 AM

wiki_willy assigned this task to RobH.May 16 2023, 9:04 PM

Cookbook cookbooks.sre.debmonitor.remove-hosts run by jmm: for 1 hosts: prometheus4001.ulsfo.wmnet

So VMs don't need/warrant a hardware decom ticket, resolving.

lmata moved this task from Inbox to Done on the SRE Observability (FY2022/2023-Q4) board.Jul 18 2023, 5:00 PM

Decommission prometheus4001
Closed, ResolvedPublicRequest
Actions

Description

Details

Related Objects
Search...

Event Timeline

Decommission prometheus4001Closed, ResolvedPublicRequestActions

Description

Details

Related ObjectsSearch...

Event Timeline

Decommission prometheus4001
Closed, ResolvedPublicRequest
Actions

Related Objects
Search...