Page MenuHomePhabricator

Investigate Junos Prometheus exporter
Closed, ResolvedPublic

Description

junos_exporter seems like a useful tool for a few reasons:

  • Migrate away from Icinga (by replacing the various check_bgp, check_vrrp, check_ospf, etc)
  • (In a longtime time) migrate away from LibreNMS (industry is moving to Prometheus like tool)
  • Monitor more things, more easily (adding similar features to LibreNMS is quite complex)
  • Overall, leverage Prometheus/Grafana/AlertManager
  • Interesting features such as "Custom Label RegEx"

Especially if it's not too complex to deploy it could help with our transition to streaming telemetry (where the industry seems to be going, while still not there yet...) by starting to get data in Prometheus and relying less on SNMP.

The main unknown so far to me is the impact of pulling this data over SSH. Does it keep the session open? Does it open one every minute?

It's moderately maintained, with the latest release in Nov 2022 but some recent patches merged in main.

Debs are provided which we could use at least for testing if not directly in prod.

Feedback welcome but I'd see it as:

  1. Generate a distinct key pair
  2. Add the key as read only on some network devices (+ whatever is needed for Firewall filters counters and policers)
  3. Install the deb to experiment with on netmon1003 (acls already permit this host)
  4. Test it
  5. If satisfying, choose on which host to run the exporter (prometheus hosts?), puppetize, etc.
  6. Write dashboards + alerting

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Thanks for the task, does indeed look like a useful tool that could simplify adding additional monitoring without having to modify the LibreNMS codebase.

I'll see if I can do some tests with it and a vMX, try to get a sense of how it operates (your comments re the ssh sessions are indeed an important consideration). If it looks good there we can proceed as you set out and test from netmon1003 to some real devices.

I took a quick look at the exporter and looks good to me too! Also +1 on the general testing/deployment plan

re: SSH from a quick read through the code it seems that connections are re-used as needed (https://github.com/czerwonk/junos_exporter/blob/main/pkg/connector/connection_manager.go) so we should be good on that front

An alternative (or complement) here would be to go the gNMI way, probably through gNMIc
https://github.com/openconfig/gnmic
https://www.youtube.com/watch?v=v3CL2vrGD_8&t=2073s

Maybe more raw than junos_exporter which mangles some data, but also more future proof (multi-platforms, gNMI).

In that case T334594: TLS certificates for network devices is a prerequisite.

Seems like a great tool, but we are going to move forward with pulling these stats using gnmic after successfully testing it under T326322. If we find any blockers that gNMI can't cover we can revisit using junos_exporter but hopefully that won't be needed. Future gnmic pipeline development will be tracked in T369384: Productionize gnmic network telemetry pipeline