Page MenuHomePhabricator

Netbox: setup backups
Closed, ResolvedPublic

Description

Before start using it as a source of truth in production we need to setup its backups, I think it's only the PostgreSQL DB, but I let @ayounsi confirm it.

Event Timeline

Volans triaged this task as Medium priority.Mar 20 2018, 4:48 PM
Volans created this task.
Restricted Application removed a project: Patch-For-Review. · View Herald TranscriptMar 20 2018, 4:48 PM

Correct, plus:
/srv/deployment/netbox/deploy/netbox/netbox/media/
and
/srv/deployment/netbox/deploy/netbox/netbox/reports/

As Netbox will be il full production as part of the quarterly goal, this needs to be done before that.

Change 447744 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] netbox: have Bacula backups of netbox hosts

https://gerrit.wikimedia.org/r/447744

Change 447747 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] netbox: define a Bacula fileset and apply it

https://gerrit.wikimedia.org/r/447747

The changes above will a) turn netbox hosts into "backup::hosts" via profile so they get all the needed Bacula agent stuff and then b) define a new fileset (paths to be backed up for netbox and apply that.

The missing part would then just be a cron to dump pgsql into a path and add that path to the fileset.

Dzahn claimed this task.Jul 25 2018, 12:58 AM

Change 447744 merged by Dzahn:
[operations/puppet@production] netbox: have Bacula backups of netbox hosts

https://gerrit.wikimedia.org/r/447744

Change 447747 merged by Dzahn:
[operations/puppet@production] netbox: define a new Bacula fileset and apply it

https://gerrit.wikimedia.org/r/447747

Change 447842 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] netbox: add psql dump cron and back it up

https://gerrit.wikimedia.org/r/447842

Change 447844 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] postgresql: add defined type to create db backups

https://gerrit.wikimedia.org/r/447844

Change 447844 merged by Dzahn:
[operations/puppet@production] postgresql: add class to create db backups

https://gerrit.wikimedia.org/r/447844

Change 447842 merged by Dzahn:
[operations/puppet@production] netbox: add psql backups

https://gerrit.wikimedia.org/r/447842

Change 449387 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] postgresql::backups: fix typo and amend usage notes

https://gerrit.wikimedia.org/r/449387

Change 449387 merged by Dzahn:
[operations/puppet@production] postgresql::backups: fix typo and amend usage notes

https://gerrit.wikimedia.org/r/449387

Dzahn added a comment.Jul 31 2018, 1:28 AM

There is now a puppetized dir and cron job on netbox machines. Copy/pasting the command from there and executing manually as user postgres works and creates a gzipped dump file in /srv/postgres-backup. (which i deleted again)

sudo -u postgres /usr/bin/pg_dumpall | gzip > /srv/postgres-backup/psql-all-dbs-`date "+%Y%m%d"`.sql.gz
root@netmon1002:/srv/postgres-backup# ls
psql-all-dbs-20180731.sql.gz

The path /srv/postgres-backup has been added to the Bacula filset called "netbox".

Tomorrow i will double-confirm it got created (at a random time), shows up in Bacula console and can be restored from there.

Change 449409 had a related patch set uploaded (by Volans; owner: Volans):
[operations/puppet@production] Fix typo in postgresql::backup

https://gerrit.wikimedia.org/r/449409

Change 449409 merged by Volans:
[operations/puppet@production] Fix typo in postgresql::backup

https://gerrit.wikimedia.org/r/449409

Change 449411 had a related patch set uploaded (by Volans; owner: Volans):
[operations/puppet@production] Fix typo in postgresql::backup (2)

https://gerrit.wikimedia.org/r/449411

Change 449411 merged by Volans:
[operations/puppet@production] Fix typo in postgresql::backup (2)

https://gerrit.wikimedia.org/r/449411

Change 449607 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] bacula/postgresql: add a generic fileset for psql

https://gerrit.wikimedia.org/r/449607

Change 449607 merged by Dzahn:
[operations/puppet@production] bacula/postgresql: add a generic fileset for psql

https://gerrit.wikimedia.org/r/449607

Change 449874 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] postgresql::backup: fix "EOF in backquote substitution"

https://gerrit.wikimedia.org/r/449874

Change 449874 merged by Dzahn:
[operations/puppet@production] postgresql::backup: fix "EOF in backquote substitution"

https://gerrit.wikimedia.org/r/449874

Change 450061 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] posgresql::backup: fix "Unterminated quoted string" in cron tab

https://gerrit.wikimedia.org/r/450061

Change 450061 merged by Dzahn:
[operations/puppet@production] posgresql::backup: fix "Unterminated quoted string" in cron tab

https://gerrit.wikimedia.org/r/450061

Change 450257 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] postgresql::backup: don't run both crons at same minute

https://gerrit.wikimedia.org/r/450257

Change 450261 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] postgresql::dump: unify commands in a single cron job

https://gerrit.wikimedia.org/r/450261

Change 450257 merged by Dzahn:
[operations/puppet@production] postgresql::backup: don't run both crons at same minute

https://gerrit.wikimedia.org/r/450257

Dzahn added a comment.Aug 4 2018, 12:50 AM

Finally there should be no more cronspam :p ... and we have actual files:

root@netmon1002:/srv/postgres-backup# ls
psql-all-dbs-20180803.sql.gz

root@netmon2001:/srv/postgres-backup# ls
psql-all-dbs-20180803.sql.gz

Also the dump and the cleanup cron don't run at the same time anymore.

On the Bacula in bconsole i can confirm the new netbox backup sets exist and srv/postgres-backup is part of a file set that could be restored from/to netmon1002 and netmon2001 as well. The actual dump files that exist now should appear shortly in the next incremental backup.

Dzahn added a comment.Aug 6 2018, 3:45 PM

Files have been created for each day and are showing up in Bacula (helium) in bconsole now.

(how to is at: https://wikitech.wikimedia.org/wiki/Bacula#Restore_(aka_Panic_mode))

bconsole:

$ cd postgres-backup/ 
cwd is: /srv/postgres-backup/
$ ls
psql-all-dbs-20180803.sql.gz
psql-all-dbs-20180804.sql.gz
psql-all-dbs-20180805.sql.gz
psql-all-dbs-20180806.sql.gz

I started a restore to netmon1002.

Using Catalog "production"
Run Restore job
JobName:         RestoreFiles
Bootstrap:       /var/lib/bacula/helium.eqiad.wmnet.restore.4.bsr
Where:           /var/tmp/bacula-restores
..
Backup Client:   netmon1002.wikimedia.org-fd
Restore Client:  netmon1002.wikimedia.org-fd
..
Job queued. JobId=103268

Mentioned in SAL (#wikimedia-operations) [2018-08-07T18:24:08Z] <mutante> netbox - restored database from dump file - backed up and back-up (T190184)

Dzahn closed this task as Resolved.Aug 7 2018, 7:33 PM

I also tested actual restore of the database:

Dropped the live prod database, confirmed web UI was down, then restored DB from dumpfile from backups and the application was up again.

I created a wikitech page for Netbox incl. a section on backups and restore:

https://wikitech.wikimedia.org/wiki/Netbox#Restore

This should resolve the ticket.

Change 463820 had a related patch set uploaded (by Volans; owner: Volans):
[operations/puppet@production] Netbox: set media directory

https://gerrit.wikimedia.org/r/463820

Change 463820 abandoned by Volans:
Netbox: set media directory

Reason:
Superseeded by other patches by Cas to move directly to Swift

https://gerrit.wikimedia.org/r/463820