- Write a mass import / mass update script so that back changes and out of sync changes can be synchronized with Netbox.
Description
Details
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | • crusnov | T213114 Q3 2018/19 Goal: TEC6: Build automated workflows for server provisioning (Tracking Task) | |||
Resolved | • crusnov | T215229 Keep Ganeti VMs synchronized in Netbox |
Event Timeline
The mass import script has been designated as the first task in this, and is in progress.
gnt-instance list --reason="crusnov - starting netbox sync dev" -o name,os,status,mac,ip,be/maxmem,vcpus,disk.sizes,tags,sda_size,sdb_size --separator=' ' gives a list of instances in a machine parsable format with all of teh pertinant data for netbox.
Also of note that command exits with a 1 status on non-master machines, so a deployment option would be to push the crontab or whatever to every ganeti host and then check the return status of that command to determine if it's the master or not at which point it would simply exit rather than attempting a sync.
Internal API equivalent : cl.Query(ganeti.constants.QR_INSTANCE, ['name','os','status','mac','ip','be/maxmem','vcpus','disk.sizes','tags'], ganeti.qlang.MakeSimpleFilter('name', None)).data where cl is initialized form ganeti.cli.GetClient
It raises ganeti.errors.OpPrereqError: ("This is not the master node, please connect to node 'ganeti1003.eqiad.wmnet' and rerun the command", 'wrong_input') on non-master nodes.
Need to investigate setting up a read-only user for RAPI in ganeti if we want a sync program that runs without privileges.
I had a conversation with Alex about this. His suggestion is to write the sync script for hosting on the netbox instances, and consume rapi from ganeti01.svc.*.wmnet, to be run periodically to sync the state into Netbox. Hooks would be right out of the picture. This seems like a good avenue.
So the procedure:
- Open rapi port to netmon*
- Add read-only user to ganeti's rapi authentication stuff http://docs.ganeti.org/ganeti/master/html/rapi.html#users-and-passwords
For the deploy of the sync script:
- Add the script to scripts/ in the netbox-deploy and add pynetbox to the freeze-requirements.sh
- Add a timer unit to systemd on netbox master host (using a puppet if to only deploy to master). See timer examples:
10:14:47 <volans> icinga/templates/initscripts/update-etcd-mw-config-lastindex.timer.systemd.erb 10:15:05 <volans> modules/icinga/templates/initscripts/update-etcd-mw-config-lastindex.systemd.erb
- Make sure that the timer unit from puppet happens after the scap pull in puppet.
Change 490397 had a related patch set uploaded (by CRusnov; owner: CRusnov):
[operations/puppet@production] Add ganeti read-only user deployment
Change 490397 merged by CRusnov:
[operations/puppet@production] Add ganeti read-only user deployment
I've merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/492202 to fix the configuration and forced a puppet run on A:ganeti as ferm failed on all of them
Change 492203 had a related patch set uploaded (by CRusnov; owner: CRusnov):
[operations/puppet@production] ganeti: Change ownership of rapi users file to match required ownership
One additional niggle once ownership is worked out. How to change the -b parameter to gnt-rapi - it is set in /etc/defaults/ganeti - this sets the listen address for the rapi daemon, which currently is set to 127.0.0.1 - which I'm not sure where this is set up, since I don't see where it is set up in Puppet.
Change 492007 had a related patch set uploaded (by CRusnov; owner: CRusnov):
[operations/software/netbox-deploy@master] Add ganeti->netbox sync script
One ongoing discussion we've been having is how to manage authorization tokens in netbox is how to track where changes are coming from. Currently the general idea is to have one read-only and one read-write token used for production in Puppet, so that regenerating a token would be as easy as creating a new one, changing it in puppet and all consumers of the netbox api are updated. The major downside of this is tracking which script is precisely interacting / making changes to the Netbox API. The initial idea was perhaps generating a separate token for each usage, but Netbox doesn't appear to track which token is used for any given API call, only the user ID so this seems sort of pointless (it is conceivable to patch netbox to track and expose this information). The only definitive way is to make separate users for each application but the management overhead seems a bit ridiculous and is not preferred. Another option may be to add a changelog parameter to the API and have that recorded and exposed in the extras_objectchange record. I guess the big question is, what level of tracking is desired? This is out of scope for now but will become pertinent as more scripts start changing netbox's contents.
Change 492203 merged by CRusnov:
[operations/puppet@production] ganeti: Change ownership of rapi users file to match required ownership
Change 493348 had a related patch set uploaded (by CRusnov; owner: CRusnov):
[operations/puppet@production] Add configuration for the ganeti->netbox sync.
Change 493349 had a related patch set uploaded (by CRusnov; owner: CRusnov):
[operations/puppet@production] Add /etc/default/ganeti to allow rapi to listen to 0.0.0.0
Change 493349 merged by CRusnov:
[operations/puppet@production] Add /etc/default/ganeti to allow rapi to listen to 0.0.0.0
Change 493774 had a related patch set uploaded (by CRusnov; owner: CRusnov):
[operations/puppet@production] Add system timer for running ganeti->netbox sync.
Change 493348 merged by CRusnov:
[operations/puppet@production] Add configuration for the ganeti->netbox sync.
Change 492007 merged by CRusnov:
[operations/software/netbox-deploy@master] Add ganeti->netbox sync script
Mentioned in SAL (#wikimedia-operations) [2019-03-14T16:45:52Z] <crusnov@deploy1001> Started deploy [netbox/deploy@59430dd]: Deploy Ganeti Sync and Upgrade to upstream v2.5.8 - T215229
Mentioned in SAL (#wikimedia-operations) [2019-03-14T16:46:23Z] <crusnov@deploy1001> Finished deploy [netbox/deploy@59430dd]: Deploy Ganeti Sync and Upgrade to upstream v2.5.8 - T215229 (duration: 00m 30s)
Mentioned in SAL (#wikimedia-operations) [2019-03-14T16:49:39Z] <crusnov@deploy1001> Started deploy [netbox/deploy@59430dd]: Deploy Ganeti Sync and Upgrade to upstream v2.5.8 (netmon1002) - T215229
Mentioned in SAL (#wikimedia-operations) [2019-03-14T16:50:29Z] <crusnov@deploy1001> Finished deploy [netbox/deploy@59430dd]: Deploy Ganeti Sync and Upgrade to upstream v2.5.8 (netmon1002) - T215229 (duration: 00m 50s)
One thing that is missing are the physical devices that belongs to a cluster, see https://netbox.wikimedia.org/virtualization/clusters/3/
It's probably something that this script should take care of IMHO. Thoughts?
I don't expect that changes all that often, but I agree that the script could take that into account (there is an API for tose devices, of course). Now that it's in place it should be straight forward to modify.
Change 493774 had a related patch set uploaded (by CRusnov; owner: CRusnov):
[operations/puppet@production] Add system timer for running ganeti->netbox sync.
Change 493774 merged by CRusnov:
[operations/puppet@production] Add system timer for running ganeti->netbox sync.
Mentioned in SAL (#wikimedia-operations) [2019-04-09T18:26:21Z] <crusnov@deploy1001> Started deploy [netbox/deploy@4aa3e47]: Add node sync to Netbox-Ganeti sync script - T215229
Mentioned in SAL (#wikimedia-operations) [2019-04-09T18:27:18Z] <crusnov@deploy1001> Finished deploy [netbox/deploy@4aa3e47]: Add node sync to Netbox-Ganeti sync script - T215229 (duration: 00m 57s)