Page MenuHomePhabricator

Spicerack: improve support for Ganeti VMs
Closed, ResolvedPublic

Description

In order to be able to perform some actions on VMs like on physical hosts (for example when decommissioning), some improvement is needed in the Netbox and Ganeti support in Spicerack.

Namely:

  • given a FQDN an easy way to get if the host is physical or virtual (ganeti). We could do that with a cumin query, but seems more correct to get the info from Netbox probably.
  • given a FQDN of a Ganeti VM get the Ganeti cluster it belongs (that in turn would allow to perform Ganeti CLI actions). This should probably come from Netbox.
  • evaluate if we should support also FQDNs in the Netbox module transparently in addition to hostnames
  • [optional] evaluate the possibility to expose some GanetiCLI object from the ganeti spicerack interface to allow to perform gnt commands directly on the right master host.

Event Timeline

Volans triaged this task as Medium priority.Aug 23 2019, 10:39 AM

Finding which cluster, or if the instance by fqdn is a Ganeti instance, could be done as easily as trrying to look it up in every configured cluster, and checking if there's information. We could provide utility functions to perform those actions trivially.

Supporting FQDN in netbox should be easy enough.

There is already a utility function to get the master node for the ganeti cluster, so making something that uses that to execute gnt commands would be very straight forward if that seems reasonable.

Change 533984 had a related patch set uploaded (by CRusnov; owner: CRusnov):
[operations/software/spicerack@master] ganeti: Add ability to get ganeti cluster for given instance

https://gerrit.wikimedia.org/r/533984

Change 533987 had a related patch set uploaded (by CRusnov; owner: CRusnov):
[operations/software/spicerack@master] netbox: Transparently support read-only operations for virtual machines

https://gerrit.wikimedia.org/r/533987

Okay I've implemented changes to the netbox and ganeti modules as linked above which should allow all of the operations requested. I have not implemented writing status to Ganeti VMs since this information is updated automatically but it should be relatively straight forward to implement if desired.

Basically the change to the ganeti module allows you to discover the cluster that a particular machine belongs to (and indeed, if it belongs to any). The change to netbox treats VMs transparently as deices for read purposes, and marks the information returned by fetch_host_detail with is_virtual and cluster_name.

In terms of allowing running commands on the Ganeti host (the gnt* commands), that will be a separate patch if we stil lwant it.

Change 533984 merged by CRusnov:
[operations/software/spicerack@master] ganeti: Add ability to get ganeti cluster for given instance

https://gerrit.wikimedia.org/r/533984

Change 533987 merged by CRusnov:
[operations/software/spicerack@master] netbox: Transparently support read-only operations for virtual machines

https://gerrit.wikimedia.org/r/533987

The non-optional ask on this is complete. I will leave this open to track the gnt* command proxying.

crusnov lowered the priority of this task from Medium to Low.Dec 9 2019, 6:28 AM

Currently we have a bunch of race conditions in the decommission path of VMs. The current actions can be summarized in:

  • an operator runs the decommission cookbook on a VM, removing it from PuppetDB
  • an operator removes the Ganeti VM manually from the cluster
  • at any point the auto-sync of ganeti will update Netbox removing the VM from it
  • at any point the PuppetDB report will run

Based on the timeline of those 4 events in some cases the PuppetDB Netbox report will alert for a bit before auto-resolving itself once all 4 steps have run. This generate unnecessary noise and should be avoided.

I think that the shortest path to solve this is:

  • add a simple support for Ganeti gnt-* commands on spicerack, at least supporting remove with --force and --shutdown-timeout=0 for now.
  • add to the decommission cookbook the removal of the Ganeti VM directly, probably right before the puppetdb removal
  • add to the decommission cookbook the force run of Ganety VM sync for the affected cluster to ensure Netbox is up to date

With this in place the race condition will be limited to few seconds between PuppetDB removal and Netbox sync update.
If deemed necessary we could improve this even more pausing the PuppetDB report during the execution of the cookbook.

Volans raised the priority of this task from Low to Medium.Jan 20 2020, 11:14 AM

Change 566054 had a related patch set uploaded (by Volans; owner: Volans):
[operations/software/spicerack@master] ganeti: add initial support for gnt-instance

https://gerrit.wikimedia.org/r/566054

Change 566054 merged by jenkins-bot:
[operations/software/spicerack@master] ganeti: add initial support for gnt-instance

https://gerrit.wikimedia.org/r/566054

Change 567164 had a related patch set uploaded (by Volans; owner: Volans):
[operations/software/spicerack@master] spicerack: add getter for the Netbox master host

https://gerrit.wikimedia.org/r/567164

Change 567168 had a related patch set uploaded (by Volans; owner: Volans):
[operations/software/spicerack@master] ganeti: add cluster to GntInstance

https://gerrit.wikimedia.org/r/567168

Change 567169 had a related patch set uploaded (by Volans; owner: Volans):
[operations/cookbooks@master] sre.hosts.decommission: improve Ganeti VM support

https://gerrit.wikimedia.org/r/567169

I think that the shortest path to solve this is:

  • add a simple support for Ganeti gnt-* commands on spicerack, at least supporting remove with --force and --shutdown-timeout=0 for now.
  • add to the decommission cookbook the removal of the Ganeti VM directly, probably right before the puppetdb removal
  • add to the decommission cookbook the force run of Ganety VM sync for the affected cluster to ensure Netbox is up to date

Plan implemented in the above CRs, pending review. The cookbook issues first the gnt shutdown that is quick and then forces the Geneti-Netbox sync and performs the gnt remove only at the end to reduce the race condition window with the Netbox report.

Change 567175 had a related patch set uploaded (by Volans; owner: Volans):
[operations/software/spicerack@master] netbox: rename injected property in host details

https://gerrit.wikimedia.org/r/567175

Change 567164 merged by jenkins-bot:
[operations/software/spicerack@master] spicerack: add getter for the Netbox master host

https://gerrit.wikimedia.org/r/567164

Change 567168 merged by jenkins-bot:
[operations/software/spicerack@master] ganeti: add cluster to instance()

https://gerrit.wikimedia.org/r/567168

Change 567175 merged by jenkins-bot:
[operations/software/spicerack@master] netbox: rename injected property in host details

https://gerrit.wikimedia.org/r/567175

Change 571780 had a related patch set uploaded (by Volans; owner: Volans):
[operations/software/spicerack@master] ganeti: use canonical cluster names

https://gerrit.wikimedia.org/r/571780

Change 571997 had a related patch set uploaded (by Volans; owner: Volans):
[operations/software/spicerack@master] ganeti: add logging for GntInstance actions

https://gerrit.wikimedia.org/r/571997

Change 571998 had a related patch set uploaded (by Volans; owner: Volans):
[operations/software/spicerack@master] ganeti: add VM creation capability

https://gerrit.wikimedia.org/r/571998

Change 571999 had a related patch set uploaded (by Volans; owner: Volans):
[operations/cookbooks@master] sre.ganeti.makevm: refactor for new spicerack

https://gerrit.wikimedia.org/r/571999

Change 571780 merged by jenkins-bot:
[operations/software/spicerack@master] ganeti: use canonical cluster names

https://gerrit.wikimedia.org/r/571780

Change 571997 merged by jenkins-bot:
[operations/software/spicerack@master] ganeti: add logging for GntInstance actions

https://gerrit.wikimedia.org/r/571997

Change 571998 merged by jenkins-bot:
[operations/software/spicerack@master] ganeti: add VM creation capability

https://gerrit.wikimedia.org/r/571998

Change 567169 merged by jenkins-bot:
[operations/cookbooks@master] sre.hosts.decommission: improve Ganeti VM support

https://gerrit.wikimedia.org/r/567169

Change 575017 had a related patch set uploaded (by Volans; owner: Volans):
[operations/cookbooks@master] sre.hosts.decommission: fix Ganeti VM decom path

https://gerrit.wikimedia.org/r/575017

Change 575017 merged by jenkins-bot:
[operations/cookbooks@master] sre.hosts.decommission: fix Ganeti VM decom path

https://gerrit.wikimedia.org/r/575017

Change 571999 merged by jenkins-bot:
[operations/cookbooks@master] sre.ganeti.makevm: refactor for new spicerack

https://gerrit.wikimedia.org/r/571999

This work has been superseded by more recent work on the Ganeti spicerack module, that now offers an easy way to get all the informations listed in the task description. See https://doc.wikimedia.org/spicerack/master/api/index.html#spicerack.Spicerack.netbox_server for more information.