Decommission wtp10[25-48].eqiad.wmnet
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	Clement_Goubert
	Sep 5 2022, 10:44 AM

Details

	Subject	Repo	Branch	Lines +/-
	wtp: Purge wtp servers following migration to parse	operations/puppet	production	+10 -40
	wtp: Purge wtp servers following migration to parse	operations/mediawiki-config	master	+0 -24

Customize query in gerrit

Related Objects
Search...

		Status	Subtype	Assigned	Task
		Resolved		Jclark-ctr	T317025 Decommission wtp10[25-48].eqiad.wmnet
		Resolved		Clement_Goubert	T307219 Put parse parse10[01-24] in production

Event Timeline

Clement_Goubert created this task.Sep 5 2022, 10:44 AM

Mentioned in SAL (#wikimedia-operations) [2022-09-05T11:36:40Z] <claime> Set wtp103[4-5].eqiad.wmnet inactive pending decommission https://phabricator.wikimedia.org/T317025

Mentioned in SAL (#wikimedia-operations) [2022-09-05T14:48:51Z] <claime> Set wtp103[6-7].eqiad.wmnet inactive pending decommission T317025

Clement_Goubert triaged this task as Medium priority.Sep 5 2022, 5:17 PM

Clement_Goubert moved this task from Incoming 🐫 to API Gateway 🥌 on the serviceops board.

Mentioned in SAL (#wikimedia-operations) [2022-09-06T12:05:00Z] <claime> Set wtp10[38-40].eqiad.wmnet inactive pending decommission T317025

Mentioned in SAL (#wikimedia-operations) [2022-09-06T15:15:13Z] <claime> Set wtp10[41-43].eqiad.wmnet inactive pending decommission T317025

Change 830802 had a related patch set uploaded (by Clément Goubert; author: Clément Goubert):

[operations/puppet@production] wtp: Purge wtp servers following migration to parse

https://gerrit.wikimedia.org/r/830802

gerritbot added a project: Patch-For-Review.Sep 8 2022, 10:52 AM

Change 830803 had a related patch set uploaded (by Clément Goubert; author: Clément Goubert):

[operations/mediawiki-config@master] wtp: Purge wtp servers following migration to parse

https://gerrit.wikimedia.org/r/830803

Change this task to a proper decommission checklist.

Clement_Goubert updated the task description. (Show Details)Sep 8 2022, 12:44 PM

cookbooks.sre.hosts.decommission executed by cgoubert@cumin1001 for hosts: wtp1034.eqiad.wmnet

wtp1034.eqiad.wmnet (PASS)
- Downtimed host on Icinga/Alertmanager
- Found physical host
- Downtimed management interface on Icinga/Alertmanager
- Wiped all swraid, partition-table and filesystem signatures
- Powered off
- [Netbox] Set status to Decommissioning, deleted all non-mgmt IPs, updated switch interfaces (disabled, removed vlans, etc
- Configured the linked switch interface(s)
- Removed from DebMonitor
- Removed from Puppet master and PuppetDB

Clement_Goubert updated the task description. (Show Details)Sep 8 2022, 1:33 PM

cookbooks.sre.hosts.decommission executed by cgoubert@cumin1001 for hosts: wtp1035.eqiad.wmnet

wtp1035.eqiad.wmnet (PASS)
- Downtimed host on Icinga/Alertmanager
- Found physical host
- Downtimed management interface on Icinga/Alertmanager
- Wiped all swraid, partition-table and filesystem signatures
- Powered off
- [Netbox] Set status to Decommissioning, deleted all non-mgmt IPs, updated switch interfaces (disabled, removed vlans, etc
- Configured the linked switch interface(s)
- Removed from DebMonitor
- Removed from Puppet master and PuppetDB

Clement_Goubert updated the task description. (Show Details)Sep 8 2022, 1:47 PM

Clement_Goubert updated the task description. (Show Details)Sep 8 2022, 1:50 PM

cookbooks.sre.hosts.decommission executed by cgoubert@cumin1001 for hosts: wtp1036.eqiad.wmnet

wtp1036.eqiad.wmnet (PASS)
- Downtimed host on Icinga/Alertmanager
- Found physical host
- Downtimed management interface on Icinga/Alertmanager
- Wiped all swraid, partition-table and filesystem signatures
- Powered off
- [Netbox] Set status to Decommissioning, deleted all non-mgmt IPs, updated switch interfaces (disabled, removed vlans, etc
- Configured the linked switch interface(s)
- Removed from DebMonitor
- Removed from Puppet master and PuppetDB

cookbooks.sre.hosts.decommission executed by cgoubert@cumin1001 for hosts: wtp1037.eqiad.wmnet

wtp1037.eqiad.wmnet (PASS)
- Downtimed host on Icinga/Alertmanager
- Found physical host
- Downtimed management interface on Icinga/Alertmanager
- Wiped all swraid, partition-table and filesystem signatures
- Powered off
- [Netbox] Set status to Decommissioning, deleted all non-mgmt IPs, updated switch interfaces (disabled, removed vlans, etc
- Configured the linked switch interface(s)
- Removed from DebMonitor
- Removed from Puppet master and PuppetDB

Clement_Goubert updated the task description. (Show Details)Sep 8 2022, 2:08 PM

cookbooks.sre.hosts.decommission executed by cgoubert@cumin1001 for hosts: wtp[1038-1042].eqiad.wmnet

wtp1038.eqiad.wmnet (PASS)
- Downtimed host on Icinga/Alertmanager
- Found physical host
- Downtimed management interface on Icinga/Alertmanager
- Wiped all swraid, partition-table and filesystem signatures
- Powered off
- [Netbox] Set status to Decommissioning, deleted all non-mgmt IPs, updated switch interfaces (disabled, removed vlans, etc
- Configured the linked switch interface(s)
- Removed from DebMonitor
- Removed from Puppet master and PuppetDB

wtp1039.eqiad.wmnet (PASS)
- Downtimed host on Icinga/Alertmanager
- Found physical host
- Downtimed management interface on Icinga/Alertmanager
- Wiped all swraid, partition-table and filesystem signatures
- Powered off
- [Netbox] Set status to Decommissioning, deleted all non-mgmt IPs, updated switch interfaces (disabled, removed vlans, etc
- Configured the linked switch interface(s)
- Removed from DebMonitor
- Removed from Puppet master and PuppetDB

wtp1040.eqiad.wmnet (PASS)
- Downtimed host on Icinga/Alertmanager
- Found physical host
- Downtimed management interface on Icinga/Alertmanager
- Wiped all swraid, partition-table and filesystem signatures
- Powered off
- [Netbox] Set status to Decommissioning, deleted all non-mgmt IPs, updated switch interfaces (disabled, removed vlans, etc
- Configured the linked switch interface(s)
- Removed from DebMonitor
- Removed from Puppet master and PuppetDB

wtp1041.eqiad.wmnet (PASS)
- Downtimed host on Icinga/Alertmanager
- Found physical host
- Downtimed management interface on Icinga/Alertmanager
- Wiped all swraid, partition-table and filesystem signatures
- Powered off
- [Netbox] Set status to Decommissioning, deleted all non-mgmt IPs, updated switch interfaces (disabled, removed vlans, etc
- Configured the linked switch interface(s)
- Removed from DebMonitor
- Removed from Puppet master and PuppetDB

wtp1042.eqiad.wmnet (PASS)
- Downtimed host on Icinga/Alertmanager
- Found physical host
- Downtimed management interface on Icinga/Alertmanager
- Wiped all swraid, partition-table and filesystem signatures
- Powered off
- [Netbox] Set status to Decommissioning, deleted all non-mgmt IPs, updated switch interfaces (disabled, removed vlans, etc
- Configured the linked switch interface(s)
- Removed from DebMonitor
- Removed from Puppet master and PuppetDB

Clement_Goubert updated the task description. (Show Details)Sep 8 2022, 2:23 PM

cookbooks.sre.hosts.decommission executed by cgoubert@cumin1001 for hosts: wtp[1043-1047].eqiad.wmnet

wtp1043.eqiad.wmnet (PASS)
- Downtimed host on Icinga/Alertmanager
- Found physical host
- Downtimed management interface on Icinga/Alertmanager
- Wiped all swraid, partition-table and filesystem signatures
- Powered off
- [Netbox] Set status to Decommissioning, deleted all non-mgmt IPs, updated switch interfaces (disabled, removed vlans, etc
- Configured the linked switch interface(s)
- Removed from DebMonitor
- Removed from Puppet master and PuppetDB

wtp1044.eqiad.wmnet (PASS)
- Downtimed host on Icinga/Alertmanager
- Found physical host
- Downtimed management interface on Icinga/Alertmanager
- Wiped all swraid, partition-table and filesystem signatures
- Powered off
- [Netbox] Set status to Decommissioning, deleted all non-mgmt IPs, updated switch interfaces (disabled, removed vlans, etc
- Configured the linked switch interface(s)
- Removed from DebMonitor
- Removed from Puppet master and PuppetDB

wtp1045.eqiad.wmnet (PASS)
- Downtimed host on Icinga/Alertmanager
- Found physical host
- Downtimed management interface on Icinga/Alertmanager
- Wiped all swraid, partition-table and filesystem signatures
- Powered off
- [Netbox] Set status to Decommissioning, deleted all non-mgmt IPs, updated switch interfaces (disabled, removed vlans, etc
- Configured the linked switch interface(s)
- Removed from DebMonitor
- Removed from Puppet master and PuppetDB

wtp1046.eqiad.wmnet (PASS)
- Downtimed host on Icinga/Alertmanager
- Found physical host
- Downtimed management interface on Icinga/Alertmanager
- Wiped all swraid, partition-table and filesystem signatures
- Powered off
- [Netbox] Set status to Decommissioning, deleted all non-mgmt IPs, updated switch interfaces (disabled, removed vlans, etc
- Configured the linked switch interface(s)
- Removed from DebMonitor
- Removed from Puppet master and PuppetDB

wtp1047.eqiad.wmnet (PASS)
- Downtimed host on Icinga/Alertmanager
- Found physical host
- Downtimed management interface on Icinga/Alertmanager
- Wiped all swraid, partition-table and filesystem signatures
- Powered off
- [Netbox] Set status to Decommissioning, deleted all non-mgmt IPs, updated switch interfaces (disabled, removed vlans, etc
- Configured the linked switch interface(s)
- Removed from DebMonitor
- Removed from Puppet master and PuppetDB

Clement_Goubert updated the task description. (Show Details)Sep 8 2022, 2:39 PM

cookbooks.sre.hosts.decommission executed by cgoubert@cumin1001 for hosts: wtp[1025-1028,1048].eqiad.wmnet

wtp1025.eqiad.wmnet (PASS)
- Downtimed host on Icinga/Alertmanager
- Found physical host
- Downtimed management interface on Icinga/Alertmanager
- Wiped all swraid, partition-table and filesystem signatures
- Powered off
- [Netbox] Set status to Decommissioning, deleted all non-mgmt IPs, updated switch interfaces (disabled, removed vlans, etc
- Configured the linked switch interface(s)
- Removed from DebMonitor
- Removed from Puppet master and PuppetDB

wtp1026.eqiad.wmnet (PASS)
- Downtimed host on Icinga/Alertmanager
- Found physical host
- Downtimed management interface on Icinga/Alertmanager
- Wiped all swraid, partition-table and filesystem signatures
- Powered off
- [Netbox] Set status to Decommissioning, deleted all non-mgmt IPs, updated switch interfaces (disabled, removed vlans, etc
- Configured the linked switch interface(s)
- Removed from DebMonitor
- Removed from Puppet master and PuppetDB

wtp1027.eqiad.wmnet (PASS)
- Downtimed host on Icinga/Alertmanager
- Found physical host
- Downtimed management interface on Icinga/Alertmanager
- Wiped all swraid, partition-table and filesystem signatures
- Powered off
- [Netbox] Set status to Decommissioning, deleted all non-mgmt IPs, updated switch interfaces (disabled, removed vlans, etc
- Configured the linked switch interface(s)
- Removed from DebMonitor
- Removed from Puppet master and PuppetDB

wtp1028.eqiad.wmnet (PASS)
- Downtimed host on Icinga/Alertmanager
- Found physical host
- Downtimed management interface on Icinga/Alertmanager
- Wiped all swraid, partition-table and filesystem signatures
- Powered off
- [Netbox] Set status to Decommissioning, deleted all non-mgmt IPs, updated switch interfaces (disabled, removed vlans, etc
- Configured the linked switch interface(s)
- Removed from DebMonitor
- Removed from Puppet master and PuppetDB

wtp1048.eqiad.wmnet (PASS)
- Downtimed host on Icinga/Alertmanager
- Found physical host
- Downtimed management interface on Icinga/Alertmanager
- Wiped all swraid, partition-table and filesystem signatures
- Powered off
- [Netbox] Set status to Decommissioning, deleted all non-mgmt IPs, updated switch interfaces (disabled, removed vlans, etc
- Configured the linked switch interface(s)
- Removed from DebMonitor
- Removed from Puppet master and PuppetDB

Clement_Goubert updated the task description. (Show Details)Sep 8 2022, 2:57 PM

Clement_Goubert updated the task description. (Show Details)Sep 8 2022, 3:00 PM

Change 830803 merged by jenkins-bot:

[operations/mediawiki-config@master] wtp: Purge wtp servers following migration to parse

https://gerrit.wikimedia.org/r/830803

Mentioned in SAL (#wikimedia-operations) [2022-09-08T15:28:44Z] <cgoubert@deploy1002> Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:830803|wtp: Purge wtp servers following migration to parse (T317025)]] (duration: 12m 48s)

Mentioned in SAL (#wikimedia-operations) [2022-09-08T15:45:42Z] <cgoubert@deploy1002> Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:830803|wtp: Purge wtp servers following migration to parse (T317025)]] (duration: 04m 00s)

cookbooks.sre.hosts.decommission executed by cgoubert@cumin1001 for hosts: wtp[1029-1033].eqiad.wmnet

wtp1029.eqiad.wmnet (FAIL)
- Downtimed host on Icinga/Alertmanager
- Found physical host
- Downtimed management interface on Icinga/Alertmanager
- Wiped all swraid, partition-table and filesystem signatures
- Failed to power off, manual intervention required: Remote IPMI for wtp1029.mgmt.eqiad.wmnet failed (exit=1): b''
- [Netbox] Set status to Decommissioning, deleted all non-mgmt IPs, updated switch interfaces (disabled, removed vlans, etc
- Configured the linked switch interface(s)
- Removed from DebMonitor
- Removed from Puppet master and PuppetDB

wtp1030.eqiad.wmnet (FAIL)
- Downtimed host on Icinga/Alertmanager
- Found physical host
- Downtimed management interface on Icinga/Alertmanager
- Wiped all swraid, partition-table and filesystem signatures
- Failed to power off, manual intervention required: Remote IPMI for wtp1030.mgmt.eqiad.wmnet failed (exit=1): b''
- [Netbox] Set status to Decommissioning, deleted all non-mgmt IPs, updated switch interfaces (disabled, removed vlans, etc
- Configured the linked switch interface(s)
- Removed from DebMonitor
- Removed from Puppet master and PuppetDB

wtp1031.eqiad.wmnet (FAIL)
- Downtimed host on Icinga/Alertmanager
- Found physical host
- Downtimed management interface on Icinga/Alertmanager
- Wiped all swraid, partition-table and filesystem signatures
- Failed to power off, manual intervention required: Remote IPMI for wtp1031.mgmt.eqiad.wmnet failed (exit=1): b''
- [Netbox] Set status to Decommissioning, deleted all non-mgmt IPs, updated switch interfaces (disabled, removed vlans, etc
- Configured the linked switch interface(s)
- Removed from DebMonitor
- Removed from Puppet master and PuppetDB

wtp1032.eqiad.wmnet (FAIL)
- Downtimed host on Icinga/Alertmanager
- Found physical host
- Downtimed management interface on Icinga/Alertmanager
- Wiped all swraid, partition-table and filesystem signatures
- Failed to power off, manual intervention required: Remote IPMI for wtp1032.mgmt.eqiad.wmnet failed (exit=1): b''
- [Netbox] Set status to Decommissioning, deleted all non-mgmt IPs, updated switch interfaces (disabled, removed vlans, etc
- Configured the linked switch interface(s)
- Removed from DebMonitor
- Removed from Puppet master and PuppetDB

wtp1033.eqiad.wmnet (FAIL)
- Downtimed host on Icinga/Alertmanager
- Found physical host
- Downtimed management interface on Icinga/Alertmanager
- Wiped all swraid, partition-table and filesystem signatures
- Failed to power off, manual intervention required: Remote IPMI for wtp1033.mgmt.eqiad.wmnet failed (exit=1): b''
- [Netbox] Set status to Decommissioning, deleted all non-mgmt IPs, updated switch interfaces (disabled, removed vlans, etc
- Configured the linked switch interface(s)
- Removed from DebMonitor
- Removed from Puppet master and PuppetDB

ERROR: some step on some host failed, check the bolded items above

Change 830802 merged by Clément Goubert:

[operations/puppet@production] wtp: Purge wtp servers following migration to parse

https://gerrit.wikimedia.org/r/830802

Clement_Goubert changed the task status from Open to In Progress.Sep 9 2022, 9:56 AM

Clement_Goubert updated the task description. (Show Details)

Clement_Goubert moved this task from API Gateway 🥌 to Doing 😎 on the serviceops board.

Clement_Goubert added a project: ops-eqiad.Sep 9 2022, 9:59 AM

Clement_Goubert updated the task description. (Show Details)

wtp[1029-1033].eqiad.wmnet didn't power off correctly.

Maintenance_bot added a project: SRE.Sep 9 2022, 10:29 AM

Maintenance_bot removed a project: Patch-For-Review.

@Clement_Goubert: I'm replying to your question on IRC here to not loose it in the backlog.

So the fact that the poweroff failed is due to the remote IPMI not working fine for some reason. If you're interested to troubleshoot it you could try to follow https://wikitech.wikimedia.org/wiki/Management_Interfaces but because the host is now unreachable all the local solution would not work.

As to reach the management interface, until the host is racked you should be able to use the asset tag to reach them: ${ASSET_TAG}.mgmt.${DC}.wmnet
The asset tag is recorded in Netbox on each the device page (so something like wmf1234.mgmt.eqiad.wmnet).
That said, given that the cookbook failed because the remote IPMI was not working, is possible that it might not work also the SSH to the mgmt console. I did a quick test and I was able to ssh to wtp1033 though... so there is some hope ;)

Thanks, I was able to complete the servers' powerdown through the management interface by using the asset tag FQDN.
wtp[1029-1033].eqiad.wmnet now powered off.

Clement_Goubert updated the task description. (Show Details)Sep 12 2022, 10:27 AM

cookbooks.sre.hosts.decommission executed by volans@cumin1001 for hosts: wtp[1028-1030]

wtp1028 (FAIL)
- No DNS record found for the mgmt interface wtp1028.mgmt.eqiad.wmnet, trying the asset tag one: wmf7047.mgmt.eqiad.wmnet
- Host not found on Icinga, unable to downtime it
- Found physical host
- Management interface not found on Icinga, unable to downtime it
- Unable to connect to the host, wipe of swraid, partition-table and filesystem signatures will not be performed: Cumin execution failed (exit_code=2)
- Powered off
- [Netbox] Set status to Decommissioning, deleted all non-mgmt IPs, updated switch interfaces (disabled, removed vlans, etc
- Configured the linked switch interface(s)
- Removed from DebMonitor
- Removed from Puppet master and PuppetDB

wtp1029 (FAIL)
- No DNS record found for the mgmt interface wtp1029.mgmt.eqiad.wmnet, trying the asset tag one: wmf7048.mgmt.eqiad.wmnet
- Host not found on Icinga, unable to downtime it
- Found physical host
- Management interface not found on Icinga, unable to downtime it
- Unable to connect to the host, wipe of swraid, partition-table and filesystem signatures will not be performed: Cumin execution failed (exit_code=2)
- Powered off
- [Netbox] Set status to Decommissioning, deleted all non-mgmt IPs, updated switch interfaces (disabled, removed vlans, etc
- Configured the linked switch interface(s)
- Removed from DebMonitor
- Removed from Puppet master and PuppetDB

wtp1030 (FAIL)
- No DNS record found for the mgmt interface wtp1030.mgmt.eqiad.wmnet, trying the asset tag one: wmf7049.mgmt.eqiad.wmnet
- Host not found on Icinga, unable to downtime it
- Found physical host
- Management interface not found on Icinga, unable to downtime it
- Unable to connect to the host, wipe of swraid, partition-table and filesystem signatures will not be performed: Cumin execution failed (exit_code=2)
- Powered off
- [Netbox] Set status to Decommissioning, deleted all non-mgmt IPs, updated switch interfaces (disabled, removed vlans, etc
- Configured the linked switch interface(s)
- Removed from DebMonitor
- Removed from Puppet master and PuppetDB