Page MenuHomePhabricator

Decommission cloudelastic1001-1004
Closed, ResolvedPublic

Description

Creating this ticket as a parent task for the decommission of cloudelastic1001-1004.

We'll use the decommission form and follow the Server Lifecycle process .

Event Timeline

Mentioned in SAL (#wikimedia-operations) [2024-02-16T15:53:11Z] <bking@cumin2002> START - Cookbook sre.elasticsearch.ban Banning hosts: cloudelastic100[1-4]* for decom hosts - bking@cumin2002 - T357780

Mentioned in SAL (#wikimedia-operations) [2024-02-16T15:53:20Z] <bking@cumin2002> END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Banning hosts: cloudelastic100[1-4]* for decom hosts - bking@cumin2002 - T357780

Gehel triaged this task as High priority.Feb 19 2024, 3:44 PM

Change 1005151 had a related patch set uploaded (by Ryan Kemper; author: Ryan Kemper):

[operations/puppet@production] cloudelastic: decom cloudelastic100[1-4]

https://gerrit.wikimedia.org/r/1005151

Change 1005151 merged by Ryan Kemper:

[operations/puppet@production] cloudelastic: decom cloudelastic100[1-4]

https://gerrit.wikimedia.org/r/1005151

cookbooks.sre.hosts.decommission executed by ryankemper@cumin2002 for hosts: cloudelastic[1001-1004].wikimedia.org

  • cloudelastic1001.wikimedia.org (FAIL)
    • Downtimed host on Icinga/Alertmanager
    • Found physical host
    • Downtimed management interface on Alertmanager
    • Failed to wipe swraid, partition-table and filesystem signatures, manual intervention required to make it unbootable: Cumin execution failed (exit_code=2)
    • Powered off
    • [Netbox] Set status to Decommissioning, deleted all non-mgmt IPs, updated switch interfaces (disabled, removed vlans, etc)
    • Configured the linked switch interface(s)
    • Removed from DebMonitor
    • Removed from Puppet master and PuppetDB
  • cloudelastic1002.wikimedia.org (PASS)
    • Downtimed host on Icinga/Alertmanager
    • Found physical host
    • Downtimed management interface on Alertmanager
    • Wiped all swraid, partition-table and filesystem signatures
    • Powered off
    • [Netbox] Set status to Decommissioning, deleted all non-mgmt IPs, updated switch interfaces (disabled, removed vlans, etc)
    • Configured the linked switch interface(s)
    • Removed from DebMonitor
    • Removed from Puppet master and PuppetDB
  • cloudelastic1003.wikimedia.org (PASS)
    • Downtimed host on Icinga/Alertmanager
    • Found physical host
    • Downtimed management interface on Alertmanager
    • Wiped all swraid, partition-table and filesystem signatures
    • Powered off
    • [Netbox] Set status to Decommissioning, deleted all non-mgmt IPs, updated switch interfaces (disabled, removed vlans, etc)
    • Configured the linked switch interface(s)
    • Removed from DebMonitor
    • Removed from Puppet master and PuppetDB
  • cloudelastic1004.wikimedia.org (PASS)
    • Downtimed host on Icinga/Alertmanager
    • Found physical host
    • Downtimed management interface on Alertmanager
    • Wiped all swraid, partition-table and filesystem signatures
    • Powered off
    • [Netbox] Set status to Decommissioning, deleted all non-mgmt IPs, updated switch interfaces (disabled, removed vlans, etc)
    • Configured the linked switch interface(s)
    • Removed from DebMonitor
    • Removed from Puppet master and PuppetDB

ERROR: some step on some host failed, check the bolded items above

Gehel claimed this task.
Gehel subscribed.

DC ops steps are tracked in T358046, we can close this.