With the new Wikidough doh* hosts running on their dedicated domain and anycasted IP, we should decomission malmok to ensure complete separation of the doh* hosts. Given the absence of a discovery mechanism, some users may be impacted by this switch; existing users who have malmok configured as the DoH/DoT endpoint were notified in advance of this switch and that the host will be shut down on July 15.
Description
Details
Related Objects
Event Timeline
Change 704125 had a related patch set uploaded (by Ssingh; author: Ssingh):
[operations/puppet@production] acme_chief: remove malmok's SNI and host from Wikidough certs
Change 705373 had a related patch set uploaded (by Ssingh; author: Ssingh):
[operations/puppet@production] site: remove decommissioned host malmok.wikimedia.org
cookbooks.sre.hosts.decommission executed by sukhe@cumin1001 for hosts: malmok.wikimedia.org
- malmok.wikimedia.org (PASS)
- Downtimed host on Icinga
- Found Ganeti VM
- VM shutdown
- Started forced sync of VMs in Ganeti cluster ganeti01.svc.codfw.wmnet to Netbox
- Removed from DebMonitor
- Removed from Puppet master and PuppetDB
- VM removed
- Started forced sync of VMs in Ganeti cluster ganeti01.svc.codfw.wmnet to Netbox
- COMMON_STEPS (FAIL)
- Failed to run the sre.dns.netbox cookbook: Cumin execution failed (exit_code=2)
ERROR: some step on some host failed, check the bolded items above
Change 705374 had a related patch set uploaded (by Ssingh; author: Ssingh):
[operations/homer/public@master] Remove malmok.wikimedia.org from anycast_neighbors in codfw
cookbooks.sre.hosts.decommission executed by sukhe@cumin1001 for hosts: malmok.wikimedia.org
- malmok.wikimedia.org (FAIL)
- Failed downtime host on Icinga (likely already removed)
- Host steps raised exception:
- COMMON_STEPS (FAIL)
- Failed to run the sre.dns.netbox cookbook: Cumin execution failed (exit_code=2)
ERROR: some step on some host failed, check the bolded items above
Change 705373 merged by Ssingh:
[operations/puppet@production] site: remove decommissioned host malmok.wikimedia.org
Change 704125 merged by Ssingh:
[operations/puppet@production] acme_chief: remove malmok's SNI and host from Wikidough certs
cookbooks.sre.hosts.decommission executed by sukhe@cumin1001 for hosts: malmok.wikimedia.org
- malmok.wikimedia.org (FAIL)
- Failed downtime host on Icinga (likely already removed)
- Host steps raised exception:
ERROR: some step on some host failed, check the bolded items above
Change 705374 merged by jenkins-bot:
[operations/homer/public@master] Remove malmok.wikimedia.org from anycast_neighbors in codfw
Change 737120 had a related patch set uploaded (by Dzahn; author: Dzahn):
[operations/puppet@production] install_server: remove malmok.wikimedia.org
I also don't see this host in debmonitor. It seems all done here besides the 2 entries in DHCP/installserver?
Change 737120 merged by Dzahn:
[operations/puppet@production] install_server: remove malmok.wikimedia.org
I think that should be it. Last I ran the cookbook, it failed (as above) with the message, "Failed downtime host on Icinga (likely already removed)" so I assumed that it was successful because I had downtimed the host first manually IIRC.
https://phabricator.wikimedia.org/rOHPUbc7ec5ee1e0a55ac79a2753a798c5f974aaef4ec indicates I removed it from homer as well.