Page MenuHomePhabricator

Import row information into Netbox for Ganeti instances
Closed, ResolvedPublic

Description

As part of supporting further integration with Netbox for automated tools we should have at least row information for ganeti VMs.

Some caveats:

  • Netbox doesn't have a field for rows for VMs. We can solve this with a custom field.
  • The rack that a particular instance lives in may change at any time and would have to be gotten from the particular server it's living on right now. We can solve this by just not worrying about it and leaving the rack out. The row info is still useful.

Event Timeline

crusnov triaged this task as Medium priority.Sep 9 2020, 5:14 PM
crusnov created this task.

@MoritzMuehlenhoff @Volans :
Instead of adding a custom field and machinery to keep it up to date, what do you think of reorganizing the existing data:
At least on the network level, it seems more appropriate to call a Netbox cluster what Ganeti calls a group.
For example create an "eqiad row A" cluster, that consists of the row A hypervisors, as they share the same failure domain.
Once done across all rows, create a "cluster group" that regroups all the eqiad clusters, behind ganeti01.svc.eqiad.wmnet

We might need to adapt the existing automation.

@MoritzMuehlenhoff @Volans :
Instead of adding a custom field and machinery to keep it up to date, what do you think of reorganizing the existing data:
At least on the network level, it seems more appropriate to call a Netbox cluster what Ganeti calls a group.
For example create an "eqiad row A" cluster, that consists of the row A hypervisors, as they share the same failure domain.
Once done across all rows, create a "cluster group" that regroups all the eqiad clusters, behind ganeti01.svc.eqiad.wmnet

Fine with me, but I can't really tell what impact that would have on the longer tail of automation.

@MoritzMuehlenhoff @Volans :
Instead of adding a custom field and machinery to keep it up to date, what do you think of reorganizing the existing data:
At least on the network level, it seems more appropriate to call a Netbox cluster what Ganeti calls a group.
For example create an "eqiad row A" cluster, that consists of the row A hypervisors, as they share the same failure domain.
Once done across all rows, create a "cluster group" that regroups all the eqiad clusters, behind ganeti01.svc.eqiad.wmnet

We might need to adapt the existing automation.

I totally agree, I think I had suggested the same in the past. We can do the changes on netbox-next so that we can try the automation changes there before rolling it to netbox prod.

Thanks, I'll look into the automation to get it work for this new structure

joanna_borun changed the task status from Open to In Progress.Dec 6 2021, 3:52 PM
joanna_borun moved this task from In Review to In Progress on the Infrastructure-Foundations board.

Change 802046 had a related patch set uploaded (by Ayounsi; author: Ayounsi):

[operations/software/netbox-extras@master] Netbox Ganeti sync: add groups support

https://gerrit.wikimedia.org/r/802046

Change 802179 had a related patch set uploaded (by Volans; author: Volans):

[operations/software/netbox-extras@master] Netbox Ganeti sync: add groups support

https://gerrit.wikimedia.org/r/802179

Not sure yet if it's a good idea, but as Netbox 3.2 allows for objects custom fields it's possible to add a link between a cluster and row (now named "location) and/or rack.
https://netbox-next.wikimedia.org/virtualization/clusters/59/

Which translates in the API as:

https://netbox-next.wikimedia.org/api/virtualization/clusters/59/
"custom_fields": {
    "Rack": null,
    "Row": {
        "id": 5,
        "url": "https://netbox-next.wikimedia.org/api/dcim/locations/5/",
        "display": "eqiad row A",
        "name": "eqiad row A",
        "slug": "eqiad-row-a",
        "_depth": 0
    }
},

I think it would make things easier to query and maintain later on rather than relying on parsing the cluster name.

That could be an option, but I think that we should still map the VMs in 3 levels:

  • Netbox cluster groups <-> Ganeti clusters
  • Netbox cluster <-> Ganeti group
  • Netbox VM <-> Ganeti VM

Once we do that we still need to give a name fo the Netbox clusters and I think that the Ganeti group name is the most appropriate here. What parsing are you referring to?

Yep I fully agree.
My comment was meant for subsequent use of the data (see for example T229397: Puppet: get data (row, rack, site, and other information) from Netbox).

For example having to parse "Row_A" to figure out what other servers (virtual or physical) share the same failure domain.

Change 805337 had a related patch set uploaded (by Volans; author: Volans):

[operations/puppet@production] Netbox: adapt ganeti-sync config file

https://gerrit.wikimedia.org/r/805337

Icinga downtime and Alertmanager silence (ID=aee08ee1-85f9-44de-8a66-77195873b06e) set by volans@cumin1001 for 4:00:00 on 1 host(s) and their services with reason: Adding support for Ganeti groups

netbox1002.eqiad.wmnet

Change 802179 merged by jenkins-bot:

[operations/software/netbox-extras@master] Netbox Ganeti sync: add groups support

https://gerrit.wikimedia.org/r/802179

Change 805337 merged by Volans:

[operations/puppet@production] Netbox: adapt ganeti-sync config file

https://gerrit.wikimedia.org/r/805337

Change 807507 had a related patch set uploaded (by Volans; author: Volans):

[operations/software/netbox-deploy@3-2-2] Add wmflib as additional dependency

https://gerrit.wikimedia.org/r/807507

Change 807507 merged by Volans:

[operations/software/netbox-deploy@3-2-2] Add wmflib as additional dependency

https://gerrit.wikimedia.org/r/807507

Change 802046 abandoned by Ayounsi:

[operations/software/netbox-extras@master] Netbox Ganeti sync: add groups support

Reason:

I90f1d54c6982496c1e10115aa0809f82dbe43ca7

https://gerrit.wikimedia.org/r/802046

Volans claimed this task.

We've solved the issue using the Cluster Groups in Netbox for Ganeti Clusters, the Clusters in Netbox for Ganeti Groups.
We've also removed the old 'row_' prefix from all Ganeti groups to both simplify them and also avoid inconsistencies like in DRMRS where the redundancy is at rack level and row_B12 was an inconsistent name.
The ganeti-netbox sync script is taking care of keeping them in sync as before.