Page MenuHomePhabricator

Configure gnmic to collect data from routers at network pops
Closed, ResolvedPublic

Description

We currently use gnmic on our netflow* VMs to collect statistics from network devices using gnmic. The gnmic config on each VM is built by puppet, and includes an entry for each device under profile::netbox::data::network_devices for the local site. The code that builds this list is in modules/profile/manifests/gnmi_telemetry.pp:

Hash[String[3], Netbox::Device::Network] $infra_devices = lookup('profile::netbox::data::network_devices'),
$targets = Hash($infra_devices.filter |$device, $attributes| {
    $attributes['role'] in ['asw', 'cr', 'cloudsw'] and $attributes['site'] == $::site
}.values.map |$device| {
    ["${device['primary_fqdn']}:${ports[$device['manufacturer']]}",
    {'subscriptions' => $targets_sub[$device['manufacturer']]}]
})

That works well for the most part, however we have two "network POPs", which only have a single router on site, and no other infra thus no netflow VM. As a result we don't collect stats for either of those two routers from any of the existing netflow VMs.

So we need to modify the above code so that we can somehow manually add devices from specific non-local sites to certain VMs. For instance we should collect the eqord (Chicago) stats on netflow1002, and the eqdfw ones on netflow2003. Hopefully this won't be too easy though I'm not sure on the simplest way to structure the data/pupeptcode to do it.

Details

Event Timeline

cmooney triaged this task as Medium priority.

I'm assuming you meant "this won't be too hard", anyways the simplest solution off the top of my head would be to have a map network pop -> site in puppet and check against said map when picking devices.

Something else to consider is that metrics coming from said devices will have site label set to the site they are being scraped from. Not sure if this is a problem or not in practice though, as it breaks away a little from the current assumption/model.

I'm assuming you meant "this won't be too hard", anyways the simplest solution off the top of my head would be to have a map network pop -> site in puppet and check against said map when picking devices.

Haha yeah. Hopefully it won't be too hard :P I'll look at it in the coming days.

Something else to consider is that metrics coming from said devices will have site label set to the site they are being scraped from. Not sure if this is a problem or not in practice though, as it breaks away a little from the current assumption/model.

Yeah there are some drawbacks but we don't have much option. The site is in all the device hostnames, so we can use a regex on the "target" tag in the prometheus metrics to restrict to a given site.

Change #1113853 had a related patch set uploaded (by Cathal Mooney; author: Cathal Mooney):

[operations/puppet@production] Add gnmic collection for network POPs

https://gerrit.wikimedia.org/r/1113853

Change #1113853 merged by Cathal Mooney:

[operations/puppet@production] Add gnmic collection for network POPs

https://gerrit.wikimedia.org/r/1113853

cmooney claimed this task.

This is working now

image.png (532×1 px, 120 KB)