Page MenuHomePhabricator

Keep Ganeti VMs synchronized in Netbox
Closed, ResolvedPublic

Description

  • Write a mass import / mass update script so that back changes and out of sync changes can be synchronized with Netbox.

Event Timeline

crusnov triaged this task as Normal priority.Feb 5 2019, 3:53 AM
crusnov created this task.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptFeb 5 2019, 3:53 AM

The mass import script has been designated as the first task in this, and is in progress.

gnt-instance list --reason="crusnov - starting netbox sync dev" -o name,os,status,mac,ip,be/maxmem,vcpus,disk.sizes,tags,sda_size,sdb_size --separator=' ' gives a list of instances in a machine parsable format with all of teh pertinant data for netbox.

Also of note that command exits with a 1 status on non-master machines, so a deployment option would be to push the crontab or whatever to every ganeti host and then check the return status of that command to determine if it's the master or not at which point it would simply exit rather than attempting a sync.

crusnov added a comment.EditedFeb 12 2019, 1:00 AM

Internal API equivalent : cl.Query(ganeti.constants.QR_INSTANCE, ['name','os','status','mac','ip','be/maxmem','vcpus','disk.sizes','tags'], ganeti.qlang.MakeSimpleFilter('name', None)).data where cl is initialized form ganeti.cli.GetClient

It raises ganeti.errors.OpPrereqError: ("This is not the master node, please connect to node 'ganeti1003.eqiad.wmnet' and rerun the command", 'wrong_input') on non-master nodes.

crusnov moved this task from Backlog to In Progress on the User-crusnov board.

Need to investigate setting up a read-only user for RAPI in ganeti if we want a sync program that runs without privileges.

I had a conversation with Alex about this. His suggestion is to write the sync script for hosting on the netbox instances, and consume rapi from ganeti01.svc.*.wmnet, to be run periodically to sync the state into Netbox. Hooks would be right out of the picture. This seems like a good avenue.

So the procedure:

Dzahn added a subscriber: Dzahn.Feb 14 2019, 1:22 AM
crusnov moved this task from Backlog to In Progress on the SRE-tools board.Feb 14 2019, 5:52 PM
crusnov updated the task description. (Show Details)Feb 14 2019, 5:57 PM

For the deploy of the sync script:

  • Add the script to scripts/ in the netbox-deploy and add pynetbox to the freeze-requirements.sh
  • Add a timer unit to systemd on netbox master host (using a puppet if to only deploy to master). See timer examples:
10:14:47	<volans>	 icinga/templates/initscripts/update-etcd-mw-config-lastindex.timer.systemd.erb
10:15:05	<volans>	 modules/icinga/templates/initscripts/update-etcd-mw-config-lastindex.systemd.erb
  • Make sure that the timer unit from puppet happens after the scap pull in puppet.

Change 490397 had a related patch set uploaded (by CRusnov; owner: CRusnov):
[operations/puppet@production] Add ganeti read-only user deployment

https://gerrit.wikimedia.org/r/490397

Change 490397 merged by CRusnov:
[operations/puppet@production] Add ganeti read-only user deployment

https://gerrit.wikimedia.org/r/490397

I've merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/492202 to fix the configuration and forced a puppet run on A:ganeti as ferm failed on all of them

Change 492203 had a related patch set uploaded (by CRusnov; owner: CRusnov):
[operations/puppet@production] ganeti: Change ownership of rapi users file to match required ownership

https://gerrit.wikimedia.org/r/492203

Thanks!

I've merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/492202 to fix the configuration and forced a puppet run on A:ganeti as ferm failed on all of them

One additional niggle once ownership is worked out. How to change the -b parameter to gnt-rapi - it is set in /etc/defaults/ganeti - this sets the listen address for the rapi daemon, which currently is set to 127.0.0.1 - which I'm not sure where this is set up, since I don't see where it is set up in Puppet.

Change 492007 had a related patch set uploaded (by CRusnov; owner: CRusnov):
[operations/software/netbox-deploy@master] Add ganeti->netbox sync script

https://gerrit.wikimedia.org/r/492007

One ongoing discussion we've been having is how to manage authorization tokens in netbox is how to track where changes are coming from. Currently the general idea is to have one read-only and one read-write token used for production in Puppet, so that regenerating a token would be as easy as creating a new one, changing it in puppet and all consumers of the netbox api are updated. The major downside of this is tracking which script is precisely interacting / making changes to the Netbox API. The initial idea was perhaps generating a separate token for each usage, but Netbox doesn't appear to track which token is used for any given API call, only the user ID so this seems sort of pointless (it is conceivable to patch netbox to track and expose this information). The only definitive way is to make separate users for each application but the management overhead seems a bit ridiculous and is not preferred. Another option may be to add a changelog parameter to the API and have that recorded and exposed in the extras_objectchange record. I guess the big question is, what level of tracking is desired? This is out of scope for now but will become pertinent as more scripts start changing netbox's contents.

Change 492203 merged by CRusnov:
[operations/puppet@production] ganeti: Change ownership of rapi users file to match required ownership

https://gerrit.wikimedia.org/r/492203

Change 493348 had a related patch set uploaded (by CRusnov; owner: CRusnov):
[operations/puppet@production] Add configuration for the ganeti->netbox sync.

https://gerrit.wikimedia.org/r/493348

Change 493349 had a related patch set uploaded (by CRusnov; owner: CRusnov):
[operations/puppet@production] Add /etc/default/ganeti to allow rapi to listen to 0.0.0.0

https://gerrit.wikimedia.org/r/493349

Change 493349 merged by CRusnov:
[operations/puppet@production] Add /etc/default/ganeti to allow rapi to listen to 0.0.0.0

https://gerrit.wikimedia.org/r/493349

Change 493774 had a related patch set uploaded (by CRusnov; owner: CRusnov):
[operations/puppet@production] Add system timer for running ganeti->netbox sync.

https://gerrit.wikimedia.org/r/493774

Change 493348 merged by CRusnov:
[operations/puppet@production] Add configuration for the ganeti->netbox sync.

https://gerrit.wikimedia.org/r/493348

Change 492007 merged by CRusnov:
[operations/software/netbox-deploy@master] Add ganeti->netbox sync script

https://gerrit.wikimedia.org/r/492007

Mentioned in SAL (#wikimedia-operations) [2019-03-14T16:45:52Z] <crusnov@deploy1001> Started deploy [netbox/deploy@59430dd]: Deploy Ganeti Sync and Upgrade to upstream v2.5.8 - T215229

Mentioned in SAL (#wikimedia-operations) [2019-03-14T16:46:23Z] <crusnov@deploy1001> Finished deploy [netbox/deploy@59430dd]: Deploy Ganeti Sync and Upgrade to upstream v2.5.8 - T215229 (duration: 00m 30s)

Mentioned in SAL (#wikimedia-operations) [2019-03-14T16:49:39Z] <crusnov@deploy1001> Started deploy [netbox/deploy@59430dd]: Deploy Ganeti Sync and Upgrade to upstream v2.5.8 (netmon1002) - T215229

Mentioned in SAL (#wikimedia-operations) [2019-03-14T16:50:29Z] <crusnov@deploy1001> Finished deploy [netbox/deploy@59430dd]: Deploy Ganeti Sync and Upgrade to upstream v2.5.8 (netmon1002) - T215229 (duration: 00m 50s)

crusnov moved this task from In Progress to Pending on the User-crusnov board.Mar 14 2019, 8:33 PM

One thing that is missing are the physical devices that belongs to a cluster, see https://netbox.wikimedia.org/virtualization/clusters/3/

It's probably something that this script should take care of IMHO. Thoughts?

I don't expect that changes all that often, but I agree that the script could take that into account (there is an API for tose devices, of course). Now that it's in place it should be straight forward to modify.

This is deployed and works in production, it needs only to have the timer deployed

Change 493774 had a related patch set uploaded (by CRusnov; owner: CRusnov):
[operations/puppet@production] Add system timer for running ganeti->netbox sync.

https://gerrit.wikimedia.org/r/493774

Change 493774 merged by CRusnov:
[operations/puppet@production] Add system timer for running ganeti->netbox sync.

https://gerrit.wikimedia.org/r/493774

crusnov moved this task from Pending to Complete on the User-crusnov board.Apr 8 2019, 5:01 PM

Mentioned in SAL (#wikimedia-operations) [2019-04-09T18:26:21Z] <crusnov@deploy1001> Started deploy [netbox/deploy@4aa3e47]: Add node sync to Netbox-Ganeti sync script - T215229

Mentioned in SAL (#wikimedia-operations) [2019-04-09T18:27:18Z] <crusnov@deploy1001> Finished deploy [netbox/deploy@4aa3e47]: Add node sync to Netbox-Ganeti sync script - T215229 (duration: 00m 57s)

faidon added a subscriber: faidon.Apr 22 2019, 11:42 PM

Should this be resolved?

crusnov closed this task as Resolved.Apr 23 2019, 2:50 AM
crusnov updated the task description. (Show Details)