Page MenuHomePhabricator

decom cobalt
Open, LowPublic

Description

This task will track the decommission of server cobalt.wikimedia.org

With the launch of updates to the decom cookbook, the majority of these steps can be handled by the service owners directly. The DC Ops team only gets involved once the system has been fully removed from service and powered down by the decommission cookbook.

cobalt.wikimedia.org

Steps for service owner:

  • - all system services confirmed offline from production use
  • - set all icinga checks to maint mode/disabled while reclaim/decommmission takes place.
  • - remove system from all lvs/pybal active configuration
  • - any service group puppet/hiera/dsh config removed
  • - remove site.pp, replace with role(spare::system) recommended to ensure services offline but not 100% required as long as the decom script is IMMEDIATELY run below.
  • - login to cumin host and run the decom cookbook: cookbook sre.hosts.decommission <host fqdn> -t <phab task>. This does: bootloader wipe, host power down, netbox update to decommissioning status, puppet node clean, puppet node deactivate, debmonitor removal.
  • - remove all remaining puppet references (include role::spare) and all host entries in the puppet repo
  • - remove ALL dns entries except the asset tag mgmt entries.
  • - reassign task from service owner to DC ops team member depending on site of server: codfw = @Papaul, eqiad = @Jclark-ctr, all other sites = @RobH.

End service owner steps / Begin DC-Ops team steps:

  • - disable switch port / set to asset tag if host isn't being unracked / remove from switch if being unracked.
  • - set switch port description to asset tag
  • - system disks wiped (by onsite)
  • - hostname mgmt dns removed, leave asset tag mgmt dns entries
  • - set netbox state to 'inventory' and hostname to asset tag

Details

Related Gerrit Patches:

Event Timeline

Dzahn created this task.Oct 22 2019, 4:42 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptOct 22 2019, 4:42 PM
Dzahn changed the task status from Open to Stalled.Oct 22 2019, 4:42 PM
Dzahn triaged this task as Medium priority.

Change 545328 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] site: turn cobalt into a spare system (Do not merge)

https://gerrit.wikimedia.org/r/545328

Change 545330 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] ci: remove cobalt from firewall rules

https://gerrit.wikimedia.org/r/545330

Change 545333 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] mariadb: remove cobalt from ferm_misc rules

https://gerrit.wikimedia.org/r/545333

Change 545334 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] acme_chief: remove cobalt from authorized hosts

https://gerrit.wikimedia.org/r/545334

Change 545335 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] gerrit: remove cobalt from ssh known_hosts file

https://gerrit.wikimedia.org/r/545335

Change 545336 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] install_server: remove cobalt from DHCP and partman

https://gerrit.wikimedia.org/r/545336

Change 545335 merged by Dzahn:
[operations/puppet@production] gerrit: remove cobalt from ssh known_hosts file

https://gerrit.wikimedia.org/r/545335

Change 545334 merged by Dzahn:
[operations/puppet@production] acme_chief: remove cobalt from authorized hosts

https://gerrit.wikimedia.org/r/545334

Change 545330 merged by Dzahn:
[operations/puppet@production] ci: remove cobalt from firewall rules

https://gerrit.wikimedia.org/r/545330

Change 547619 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] gerrit: allow rsync of home dirs for server migrations

https://gerrit.wikimedia.org/r/547619

Change 547619 merged by Dzahn:
[operations/puppet@production] gerrit: allow rsync of home dirs for server migrations

https://gerrit.wikimedia.org/r/547619

Change 545336 merged by Dzahn:
[operations/puppet@production] install_server: remove cobalt from DHCP and partman

https://gerrit.wikimedia.org/r/545336

Change 545328 merged by Dzahn:
[operations/puppet@production] site: turn former Gerrit server into a spare system

https://gerrit.wikimedia.org/r/545328

Change 545333 merged by Dzahn:
[operations/puppet@production] mariadb: remove cobalt from ferm_misc rules

https://gerrit.wikimedia.org/r/545333

Change 547650 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/dns@master] remove production IPs for cobalt.wikimedia.org

https://gerrit.wikimedia.org/r/547650

Is there any difference between this and T236747?

Is there any difference between this and T236747?

No, there isn't. Thanks for spotting the duplicate. Merged.

Dzahn changed the task status from Stalled to Open.Nov 1 2019, 6:12 PM
Dzahn updated the task description. (Show Details)
Dzahn changed the task status from Open to Stalled.Nov 1 2019, 6:25 PM
Dzahn removed a project: Patch-For-Review.
Dzahn updated the task description. (Show Details)

cookbooks.sre.hosts.decommission executed by dzahn@cumin1001 for hosts: cobalt.wikimedia.org

  • cobalt.wikimedia.org (PASS)
    • Downtimed host on Icinga
    • Downtimed management interface on Icinga
    • Wiped bootloaders
    • Powered off
    • Set Netbox status to Decommissioning
    • Removed from DebMonitor
    • Removed from Puppet master and PuppetDB
Dzahn updated the task description. (Show Details)

Change 548881 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] site: remove node cobalt.wikimedia.org

https://gerrit.wikimedia.org/r/548881

Change 548881 merged by Dzahn:
[operations/puppet@production] site: remove node cobalt.wikimedia.org

https://gerrit.wikimedia.org/r/548881

Change 547650 merged by Dzahn:
[operations/dns@master] remove production IPs for cobalt.wikimedia.org

https://gerrit.wikimedia.org/r/547650

Dzahn changed the task status from Stalled to Open.Nov 5 2019, 11:49 PM
Dzahn reassigned this task from Dzahn to Jclark-ctr.
Dzahn lowered the priority of this task from Medium to Low.
Dzahn removed a project: Patch-For-Review.
Dzahn updated the task description. (Show Details)
Dzahn added a comment.Nov 5 2019, 11:53 PM

@wiki_willy

Purchase date Dec. 4, 2015
Support contract —
Support expiry date Dec. 5, 2018

^ I guess this means we'll keep it around for another year or so in the spare pool.

@Dzahn - sounds good to me.

RobH updated the task description. (Show Details)Nov 6 2019, 12:56 AM
RobH updated the task description. (Show Details)
hashar added a subscriber: hashar.Nov 6 2019, 9:38 AM

Thank you @Dzahn for all the clean up tasks!

RobH removed a subscriber: RobH.Nov 6 2019, 4:27 PM