⚓ T337269 decommission labstore100[45].eqiad.wmne

	Subject	Repo	Branch	Lines +/-
	Remove references to labstore100[45]	operations/puppet	production	+2 -11

Andrew created this task.May 22 2023, 8:29 PM

Restricted Application added a project: cloud-services-team. · View Herald TranscriptMay 22 2023, 8:29 PM

Change 922168 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):

[operations/puppet@production] Remove references to labstore100[45]

https://gerrit.wikimedia.org/r/922168

gerritbot added a project: Patch-For-Review.May 22 2023, 8:36 PM

Andrew updated the task description. (Show Details)May 22 2023, 8:37 PM

Change 922168 merged by Andrew Bogott:

[operations/puppet@production] Remove references to labstore100[45]

https://gerrit.wikimedia.org/r/922168

cookbooks.sre.hosts.decommission executed by andrew@cumin1001 for hosts: labstore1005.eqiad.wmnet

labstore1005.eqiad.wmnet (FAIL)
- Downtimed host on Icinga/Alertmanager
- Found physical host
- Management interface not found on Icinga, unable to downtime it
- Wiped all swraid, partition-table and filesystem signatures
- Powered off
- [Netbox] Set status to Decommissioning, deleted all non-mgmt IPs, updated switch interfaces (disabled, removed vlans, etc)
- Host steps raised exception: Cumin execution failed (exit_code=2)

ERROR: some step on some host failed, check the bolded items above

Two things:

The decom script said this during both runs:

Traceback (most recent call last):
  File "/srv/deployment/spicerack/cookbooks/sre/hosts/decommission.py", line 429, in run
    self._decommission_host(fqdn)
  File "/srv/deployment/spicerack/cookbooks/sre/hosts/decommission.py", line 370, in _decommission_host
    configure_switch_interfaces(self.remote, netbox, netbox_data, self.spicerack.verbose)
  File "/srv/deployment/spicerack/cookbooks/sre/network/__init__.py", line 40, in configure_switch_interfaces
    switch_fqdn = nb_switch_interface.device.primary_ip.dns_name
AttributeError: 'NoneType' object has no attribute 'dns_name'

These hosts likely have drive shelves attached to them that should also be removed.

cookbooks.sre.hosts.decommission executed by andrew@cumin1001 for hosts: labstore1004.eqiad.wmnet

labstore1004.eqiad.wmnet (FAIL)
- Downtimed host on Icinga/Alertmanager
- Found physical host
- Management interface not found on Icinga, unable to downtime it
- Wiped all swraid, partition-table and filesystem signatures
- Powered off
- [Netbox] Set status to Decommissioning, deleted all non-mgmt IPs, updated switch interfaces (disabled, removed vlans, etc)
- Host steps raised exception: 'NoneType' object has no attribute 'dns_name'

ERROR: some step on some host failed, check the bolded items above

Maintenance_bot removed a project: Patch-For-Review.May 22 2023, 9:11 PM

Maintenance_bot added a project: SRE.May 22 2023, 9:29 PM

Jclark-ctr closed this task as Resolved.May 24 2023, 1:28 PM

Jclark-ctr updated the task description. (Show Details)

Mentioned in SAL (#wikimedia-operations) [2023-05-31T08:52:40Z] <moritzm> manually run puppet node clean/deactivate for labstore1004/1005 (which run into a traceback in the decom script) T337269

Cookbook cookbooks.sre.debmonitor.remove-hosts run by jmm: for 1 hosts: labstore1004.eqiad.wmnet

Cookbook cookbooks.sre.debmonitor.remove-hosts run by jmm: for 1 hosts: labstore1005.eqiad.wmnet

MoritzMuehlenhoff mentioned this in T337003: Decommission labstore1004 and labstore1005 once they're no longer used.May 31 2023, 8:56 AM

decommission labstore100[45].eqiad.wmne
Closed, ResolvedPublicRequest
Actions

Description

Details

Related Objects

Event Timeline

decommission labstore100[45].eqiad.wmneClosed, ResolvedPublicRequestActions

Description

Details

Related Objects

Event Timeline

decommission labstore100[45].eqiad.wmne
Closed, ResolvedPublicRequest
Actions