Page MenuHomePhabricator

Export Netbox Stats for DCops to a visualization tool
Closed, ResolvedPublic

Description

Papaul has requested an automated system to export statistics from Netbox into (prometheus/graphana|kibana).

After some discussion with observability, there are some possible solutions:

  1. Export aggregate data (counts of server type, manufacturer, status) tagged with data center to Prometheus via a mini web applet that queries the Netbox API and exposes a prometheus data set.
  1. Create an SQL query based dashboard in Graphana to query the Netbox database directly.

Primarily the interest is in counts of model number and status broken down by data center so graphs of these is the main deliverable.

Event Timeline

Change 574600 had a related patch set uploaded (by CRusnov; owner: CRusnov):
[operations/software/netbox-extras@master] Add support for getting Device status breakdowns

https://gerrit.wikimedia.org/r/574600

Change 574600 merged by CRusnov:
[operations/software/netbox-extras@master] Add support for getting Device status breakdowns

https://gerrit.wikimedia.org/r/574600

Change 575603 had a related patch set uploaded (by CRusnov; owner: CRusnov):
[operations/puppet@production] netbox: Add framework for exposing scripts to internal services

https://gerrit.wikimedia.org/r/575603

Change 576459 had a related patch set uploaded (by CRusnov; owner: CRusnov):
[operations/puppet@production] prometheus::ops: Add prometheus job to scrape Netbox scripts

https://gerrit.wikimedia.org/r/576459

This has patches pending final review. Should push to production soon.

Change 575603 merged by CRusnov:
[operations/puppet@production] netbox: Add framework for exposing scripts to internal services

https://gerrit.wikimedia.org/r/575603

Change 576459 merged by CRusnov:
[operations/puppet@production] prometheus::ops: Add prometheus job to scrape Netbox scripts

https://gerrit.wikimedia.org/r/576459

FYI, Prometheus is trying to query netbox2001.wikimedia.org:8443 but there is nothing listening on that port. Which is causing this alert:
https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=icinga1001&service=Prometheus+jobs+reduced+availability

Prometheus jobs reduced availability - job=netbox_device_statistics site=codfw

As it's a generic alert I only ACKed it for 2 days.

ayounsi raised the priority of this task from Medium to High.May 21 2020, 10:03 AM

Thanks for the ping, I will get to it today.

Change 597851 had a related patch set uploaded (by CRusnov; owner: CRusnov):
[operations/puppet@production] netbox scriptproxy: Add listen stanza for 8443 for the scriptproxy

https://gerrit.wikimedia.org/r/597851

Change 597851 merged by CRusnov:
[operations/puppet@production] netbox scriptproxy: Add listen stanza for 8443 for the scriptproxy

https://gerrit.wikimedia.org/r/597851

Okay this should be fix

FYI, Prometheus is trying to query netbox2001.wikimedia.org:8443 but there is nothing listening on that port. Which is causing this alert:
https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=icinga1001&service=Prometheus+jobs+reduced+availability

Prometheus jobs reduced availability - job=netbox_device_statistics site=codfw

As it's a generic alert I only ACKed it for 2 days.

I have fixed the issue with nb2001. This should be resolved.