⚓ T257406 decommission db1097.eqiad.wmnet

Subject	Repo	Branch	Lines +/-
Removing mgmt dns for decom host db1097	operations/dns	master	+1 -4
wmnet: Remove db1097 DNS	operations/dns	master	+0 -2
mariadb: Remove puppet references for db1097	operations/puppet	production	+0 -16

• Marostegui created this task.Jul 8 2020, 8:18 AM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJul 8 2020, 8:18 AM

• Marostegui moved this task from Triage to In progress on the DBA board.Jul 8 2020, 8:18 AM

• Marostegui mentioned this in T256717: db1097 (m1 master) crashed due to memory issues..

Change 612135 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] mariadb: Remove puppet references for db1097

https://gerrit.wikimedia.org/r/612135

gerritbot added a project: Patch-For-Review.Jul 13 2020, 6:49 AM

cookbooks.sre.hosts.decommission executed by marostegui@cumin1001 for hosts: db1097.eqiad.wmnet

db1097.eqiad.wmnet (FAIL)
- Downtimed host on Icinga
- Found physical host
- Downtimed management interface on Icinga
- Wiped bootloaders
- Failed to power off, manual intervention required: Remote IPMI for db1097.mgmt.eqiad.wmnet failed (exit=1): b''
- Set Netbox status to Decommissioning
- Removed from DebMonitor
- Removed from Puppet master and PuppetDB

ERROR: some step on some host failed, check the bolded items above

Change 612135 merged by Marostegui:
[operations/puppet@production] mariadb: Remove puppet references for db1097

https://gerrit.wikimedia.org/r/612135

cookbooks.sre.hosts.decommission executed by marostegui@cumin1001 for hosts: db1097.eqiad.wmnet

db1097.eqiad.wmnet (FAIL)
- Downtimed host on Icinga
- Found physical host
- Downtimed management interface on Icinga
- Wiped bootloaders
- Failed to power off, manual intervention required: Remote IPMI for db1097.mgmt.eqiad.wmnet failed (exit=1): b''
- Set Netbox status to Decommissioning
- Removed from DebMonitor
- Removed from Puppet master and PuppetDB

ERROR: some step on some host failed, check the bolded items above

I have powered off the host manually, the IPMI connection was failing

• Marostegui updated the task description. (Show Details)Jul 13 2020, 6:55 AM

Change 612136 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/dns@master] wmnet: Remove db1097 DNS

https://gerrit.wikimedia.org/r/612136

Change 612136 merged by Marostegui:
[operations/dns@master] wmnet: Remove db1097 DNS

https://gerrit.wikimedia.org/r/612136

• Marostegui updated the task description. (Show Details)Jul 13 2020, 7:07 AM

@Jclark-ctr please note that this host has mainboard/memory issues, so let's label it as such. However, the disks and the BBU should be usable as spare if any other host like this requires it, as they are perfectly usable. Can we set them aside somewhere so we have spare pieces for those hosts which are a similar model and are not under warranty anymore?

• Cmjohnson moved this task from Backlog to Decommission on the ops-eqiad board.Jul 13 2020, 5:33 PM

Jclark-ctr updated the task description. (Show Details)Jul 20 2020, 10:28 PM

Change 616884 had a related patch set uploaded (by Cmjohnson; owner: Cmjohnson):
[operations/dns@master] Removing mgmt dns for decom host db1097

https://gerrit.wikimedia.org/r/616884

gerritbot added a project: Patch-For-Review.Jul 28 2020, 6:10 PM

Change 616884 merged by Cmjohnson:
[operations/dns@master] Removing mgmt dns for decom host db1097

https://gerrit.wikimedia.org/r/616884

• Cmjohnson closed this task as Resolved.Jul 28 2020, 7:11 PM

• Cmjohnson updated the task description. (Show Details)