Page MenuHomePhabricator

Evaluate NetBox as a Racktables replacement & IPAM
Closed, ResolvedPublic

Description

@ayounsi has proposed netbox, for an IPAM and Racktables replacement. It's a Django app, written by DigitalOcean and looks really promising. It seems to have more features than servermon and given our minimal time investment into servermon, it looks more likely that this will be both more sustainable and cover more use cases than the ones we were thinking already.

Since v2, it also supports a fully read/write REST API, which will certainly come in useful in a few different ways (e.g. polling a server's location from Puppet, pulling IPAM data in netops' configuration management, potentially integrating with our hardware provisioning workflows etc.).

@ayounsi has set up a test instance in Labs already. We should evaluate it further and attempt to script a data migration, to see if everything we are planning to do with it can be done. T150651, originally worked on by @akosiaris for servermon, will probably be an issue here too.

Prod instance ( http://netbox.wikimedia.org ) has been up for a while and tested. Next steps are:

  • Get confirmation from DCops that they are ready to use it [@RobH]
  • Setup backups [T190184]
  • Select a cut-over date [@faidon]
  • Set Racktables in read only mode
  • Do Racktables -> Netbox migration [@faidon]
  • Verify imported data

Event Timeline

Test instance upgraded to the latest master.

Change 387880 had a related patch set uploaded (by Ayounsi; owner: Ayounsi):
[operations/puppet@production] [WIP] Puppetize Netbox

https://gerrit.wikimedia.org/r/387880

Change 387880 merged by Ayounsi:
[operations/puppet@production] Puppetize Netbox

https://gerrit.wikimedia.org/r/387880

resolved? (https://netbox.wikimedia.org/login/?next=/)

@ayounsi

P.S. just a note that there is a puppet error on netmon2001 that doesn't exist on netmon1002, which is related to not finding a Postgresql::User, seems like the user creation itself was a manual step.

Error: Could not find user postgres
Error: Could not find command '/usr/bin/pg_basebackup'
Error: /Stage[main]/Profile::Netbox/Postgresql::User[prometheus@localhost]/Exec[create_user-prometheus@localhost]: Could not evaluate: Could not find command '/usr/bin/psql'

No, not resolved yet, but in progress :) You're absolutely right we haven't updated this task though (my fault!)

Current progress is:

  • Netbox has been productionized (puppet, LDAP integration etc.) and deployed,
  • A first/initial test import of Racktables for the DCIM part has been performed,
  • A small number of issues with that import have been found are being addressed
  • Meanwhile, we need to continue the evaluation and find other potential issues or roadblocks (and especially by our DC Opsens)

While trying to fix the issues after the reboot for the kernel upgrade, I've opened T184634.
But now it seems that the Postgres DB is empty (no tables in the netbox DB). I'm not sure if it was emptied as part of some of the tests above, or the reboot + puppet broken might have done this.

Netbox has been upgraded to 2.3.1 which supports virtual chassis switches.

Updating description for next steps.

This is now part of this quarter goals, moving it as child of T199083.

Subtask to setup backups is now resolved. Incl. testing restore of files from Bacula console back to both netmon servers and dropping the psql database for netbox and then restoring it from one of the dump files.

As a side-effect new class postgresql::backup can be used by other services using postgres where needed.

Volans updated the task description. (Show Details)