Evaluate NetBox as a Racktables replacement & IPAM
Open, NormalPublic

Description

@ayounsi has proposed netbox, for an IPAM and Racktables replacement. It's a Django app, written by DigitalOcean and looks really promising. It seems to have more features than servermon and given our minimal time investment into servermon, it looks more likely that this will be both more sustainable and cover more use cases than the ones we were thinking already.

Since v2, it also supports a fully read/write REST API, which will certainly come in useful in a few different ways (e.g. polling a server's location from Puppet, pulling IPAM data in netops' configuration management, potentially integrating with our hardware provisioning workflows etc.).

@ayounsi has set up a test instance in Labs already. We should evaluate it further and attempt to script a data migration, to see if everything we are planning to do with it can be done. T150651, originally worked on by @akosiaris for servermon, will probably be an issue here too.

faidon created this task.Jul 10 2017, 1:16 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJul 10 2017, 1:16 PM
faidon moved this task from Backlog to Up next on the monitoring board.Jul 10 2017, 1:17 PM
ayounsi moved this task from Backlog to Watching on the netops board.Jul 12 2017, 7:18 PM

Test instance upgraded to the latest master.

Change 387880 had a related patch set uploaded (by Ayounsi; owner: Ayounsi):
[operations/puppet@production] [WIP] Puppetize Netbox

https://gerrit.wikimedia.org/r/387880

Change 387880 merged by Ayounsi:
[operations/puppet@production] Puppetize Netbox

https://gerrit.wikimedia.org/r/387880

Dzahn added a subscriber: Dzahn.EditedWed, Jan 3, 9:56 PM

resolved? (https://netbox.wikimedia.org/login/?next=/)

@ayounsi

P.S. just a note that there is a puppet error on netmon2001 that doesn't exist on netmon1002, which is related to not finding a Postgresql::User, seems like the user creation itself was a manual step.

Error: Could not find user postgres
Error: Could not find command '/usr/bin/pg_basebackup'
Error: /Stage[main]/Profile::Netbox/Postgresql::User[prometheus@localhost]/Exec[create_user-prometheus@localhost]: Could not evaluate: Could not find command '/usr/bin/psql'

No, not resolved yet, but in progress :) You're absolutely right we haven't updated this task though (my fault!)

Current progress is:

  • Netbox has been productionized (puppet, LDAP integration etc.) and deployed,
  • A first/initial test import of Racktables for the DCIM part has been performed,
  • A small number of issues with that import have been found are being addressed
  • Meanwhile, we need to continue the evaluation and find other potential issues or roadblocks (and especially by our DC Opsens)
Volans added a subscriber: Volans.Thu, Jan 11, 9:43 AM

While trying to fix the issues after the reboot for the kernel upgrade, I've opened T184634.
But now it seems that the Postgres DB is empty (no tables in the netbox DB). I'm not sure if it was emptied as part of some of the tests above, or the reboot + puppet broken might have done this.