HomePhabricator
Netbox news
Status update and future plans after a major upgrade

Netbox is a tool used by all SREs, either directly or abstracted through cookbooks and various scripts. Managed by Infrastructure-Foundations, it went through a major (and much needed!) upgrade this past quarter, led by John Bond, myself and with the help of Riccardo.

For historical context, around the release of Netbox 2.10 the project was forked, creating the new project, Nautobot. Netbox 2.10.4 was the last version which was compatible with both Netbox and the new fork as such we remained on this version until we could evaluate our needs going forward. After discussions we decided to stay on Netbox (see why).

Here is a rundown of the current and future improvements. This work was tracked in T296452: Upgrade Netbox to 3.2

Infrastructure

  • 100% on bullseye servers (it used to be on buster)
  • Behind our CDNs
  • Active/passive frontends
  • Separate internal vs. external endpoints (creation of netbox.discovery.wmnet)
  • Documentation fully refactored and cleaned up: have a look at Netbox - Wikitech
  • New “single pane of glass” Grafana dashboard for Netbox health monitoring https://grafana.wikimedia.org/d/DvXT6LCnk/netbox

All the above improvements will contribute to having a rock solid source of truth as we come to rely on it more and more across SRE.

New changes already visible and used in prod

  • Relooked UI
    • Including a dark mode
    • Better filtering on pretty much all the pages
  • Group support for Ganeti clusters - T262446
    • Helps model our infrastructure better,
    • Ties in John’s work to expose Netbox data in Puppet - see T229397,
    • Will allow hosts that rely on row/rack data to not need hardcoded values anymore (eg. kubernetes)
  • Improved reports (reports are automated functions that alert on data inconsistencies based on our own rules and conventions):
    • Network interfaces MTU miss configurations
    • Better error logging for reports
  • End to end path tracing now traverse circuits
  • Improved contact management
  • AS numbers model
  • Improved custom fields support (custom fields is a way to extend the default models, to store WMF specific data, for example procurement task on servers)
    • Extended to most of the models (for example add a purchase date to an inventory item)
    • Objects can now be used as custom “fields”, (for example, link a row to a Ganeti cluster)

While we’ve been busy with the above, there is much more to evaluate and possibly implement. A complete list is available on https://wikitech.wikimedia.org/wiki/Netbox#Future_improvements many of them are good first bugs if you’re interested in learning more and contributing to our setup.

Here are some highlights:

  • GraphQL API - T310577
    • Some scripts are well known for their slowness (eg. DNS cookbook), GraphQL should speed them up
  • Basic change rollback - T310589
    • If deemed viable, this will help quickly rollback accidental edits and deletions
  • Use Custom Model Validation - T310590
    • Long time requested by DCops, this will help reduce entry mistakes (eg. typoes) before the happen (compared to waiting for a report to trigger)
  • Make more extensive use of Netbox custom fields - T305126
    • This will allow us to move data from yaml files and free from “description” fields to structured Netbox fields
  • Represent sub-interface and bridge device associations in Netbox - T296832
    • This will allow us to document some edge cases of some server’s configuration,
    • And modeling our network devices better,
    • Both the above helping us to improve our automation
  • Using a central Redis instance - T311385
    • Prerequisite for active/active frontends (faster and more reliable)

We hope that this background work will improve your SRE experience through making this abstraction to our real life infrastructure faster, more complete and more reliable.

As usual, don’t hesitate to reach out to Infrastructure-Foundations, for any issues, requests or suggestions.


NOTE: The following text was sent to ops-private@, saving it here for public archival

Header image from Unsplash

Written by ayounsi on Jun 28 2022, 11:27 AM.
Staff Network SRE
Projects
Subscribers
None

Event Timeline