Page MenuHomePhabricator

Decomission malmok.wikimedia.org
Closed, ResolvedPublic

Description

With the new Wikidough doh* hosts running on their dedicated domain and anycasted IP, we should decomission malmok to ensure complete separation of the doh* hosts. Given the absence of a discovery mechanism, some users may be impacted by this switch; existing users who have malmok configured as the DoH/DoT endpoint were notified in advance of this switch and that the host will be shut down on July 15.

Event Timeline

Change 704125 had a related patch set uploaded (by Ssingh; author: Ssingh):

[operations/puppet@production] acme_chief: remove malmok's SNI and host from Wikidough certs

https://gerrit.wikimedia.org/r/704125

Change 705373 had a related patch set uploaded (by Ssingh; author: Ssingh):

[operations/puppet@production] site: remove decommissioned host malmok.wikimedia.org

https://gerrit.wikimedia.org/r/705373

cookbooks.sre.hosts.decommission executed by sukhe@cumin1001 for hosts: malmok.wikimedia.org

  • malmok.wikimedia.org (PASS)
    • Downtimed host on Icinga
    • Found Ganeti VM
    • VM shutdown
    • Started forced sync of VMs in Ganeti cluster ganeti01.svc.codfw.wmnet to Netbox
    • Removed from DebMonitor
    • Removed from Puppet master and PuppetDB
    • VM removed
    • Started forced sync of VMs in Ganeti cluster ganeti01.svc.codfw.wmnet to Netbox
  • COMMON_STEPS (FAIL)
    • Failed to run the sre.dns.netbox cookbook: Cumin execution failed (exit_code=2)

ERROR: some step on some host failed, check the bolded items above

Change 705374 had a related patch set uploaded (by Ssingh; author: Ssingh):

[operations/homer/public@master] Remove malmok.wikimedia.org from anycast_neighbors in codfw

https://gerrit.wikimedia.org/r/705374

cookbooks.sre.hosts.decommission executed by sukhe@cumin1001 for hosts: malmok.wikimedia.org

  • malmok.wikimedia.org (FAIL)
    • Failed downtime host on Icinga (likely already removed)
    • Host steps raised exception:
  • COMMON_STEPS (FAIL)
    • Failed to run the sre.dns.netbox cookbook: Cumin execution failed (exit_code=2)

ERROR: some step on some host failed, check the bolded items above

Change 705373 merged by Ssingh:

[operations/puppet@production] site: remove decommissioned host malmok.wikimedia.org

https://gerrit.wikimedia.org/r/705373

Change 704125 merged by Ssingh:

[operations/puppet@production] acme_chief: remove malmok's SNI and host from Wikidough certs

https://gerrit.wikimedia.org/r/704125

cookbooks.sre.hosts.decommission executed by sukhe@cumin1001 for hosts: malmok.wikimedia.org

  • malmok.wikimedia.org (FAIL)
    • Failed downtime host on Icinga (likely already removed)
    • Host steps raised exception:

ERROR: some step on some host failed, check the bolded items above

Change 705374 merged by jenkins-bot:

[operations/homer/public@master] Remove malmok.wikimedia.org from anycast_neighbors in codfw

https://gerrit.wikimedia.org/r/705374

Change 737120 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] install_server: remove malmok.wikimedia.org

https://gerrit.wikimedia.org/r/737120

Failed to run the sre.dns.netbox cookbook: Cumin execution failed (exit_code=2)

confirmed this is not in netbox and not in DNS repo today

I also don't see this host in debmonitor. It seems all done here besides the 2 entries in DHCP/installserver?

Change 737120 merged by Dzahn:

[operations/puppet@production] install_server: remove malmok.wikimedia.org

https://gerrit.wikimedia.org/r/737120

I also don't see this host in debmonitor. It seems all done here besides the 2 entries in DHCP/installserver?

I think that should be it. Last I ran the cookbook, it failed (as above) with the message, "Failed downtime host on Icinga (likely already removed)" so I assumed that it was successful because I had downtimed the host first manually IIRC.

https://phabricator.wikimedia.org/rOHPUbc7ec5ee1e0a55ac79a2753a798c5f974aaef4ec indicates I removed it from homer as well.

ACK!:) Also not in Icinga, so it's gone from puppet db.