|Open||None||T116063 Hardware Automation Workflow - Overall Tracking|
|Resolved||None||T199083 Migrate the hardware inventory from Racktables to Netbox|
|Resolved||faidon||T170144 Evaluate NetBox as a Racktables replacement & IPAM|
|Resolved||ayounsi||T184242 Puppet broken on deployment-netbox, looks like it thinks its a prod box|
|Resolved||ayounsi||T190134 Can't login on netbox|
|Resolved||Dzahn||T190184 Netbox: setup backups|
The changes above will a) turn netbox hosts into "backup::hosts" via profile so they get all the needed Bacula agent stuff and then b) define a new fileset (paths to be backed up for netbox and apply that.
The missing part would then just be a cron to dump pgsql into a path and add that path to the fileset.
There is now a puppetized dir and cron job on netbox machines. Copy/pasting the command from there and executing manually as user postgres works and creates a gzipped dump file in /srv/postgres-backup. (which i deleted again)
sudo -u postgres /usr/bin/pg_dumpall | gzip > /srv/postgres-backup/psql-all-dbs-`date "+%Y%m%d"`.sql.gz root@netmon1002:/srv/postgres-backup# ls psql-all-dbs-20180731.sql.gz
The path /srv/postgres-backup has been added to the Bacula filset called "netbox".
Tomorrow i will double-confirm it got created (at a random time), shows up in Bacula console and can be restored from there.
Finally there should be no more cronspam :p ... and we have actual files:
root@netmon1002:/srv/postgres-backup# ls psql-all-dbs-20180803.sql.gz root@netmon2001:/srv/postgres-backup# ls psql-all-dbs-20180803.sql.gz
Also the dump and the cleanup cron don't run at the same time anymore.
On the Bacula in bconsole i can confirm the new netbox backup sets exist and srv/postgres-backup is part of a file set that could be restored from/to netmon1002 and netmon2001 as well. The actual dump files that exist now should appear shortly in the next incremental backup.
Files have been created for each day and are showing up in Bacula (helium) in bconsole now.
$ cd postgres-backup/ cwd is: /srv/postgres-backup/ $ ls psql-all-dbs-20180803.sql.gz psql-all-dbs-20180804.sql.gz psql-all-dbs-20180805.sql.gz psql-all-dbs-20180806.sql.gz
I started a restore to netmon1002.
Using Catalog "production" Run Restore job JobName: RestoreFiles Bootstrap: /var/lib/bacula/helium.eqiad.wmnet.restore.4.bsr Where: /var/tmp/bacula-restores .. Backup Client: netmon1002.wikimedia.org-fd Restore Client: netmon1002.wikimedia.org-fd .. Job queued. JobId=103268
I also tested actual restore of the database:
Dropped the live prod database, confirmed web UI was down, then restored DB from dumpfile from backups and the application was up again.
I created a wikitech page for Netbox incl. a section on backups and restore:
This should resolve the ticket.