Page MenuHomePhabricator

Improve regular production database backups handling
Open, MediumPublic

Description

  • Documentation, documentation, documentation
  • More flexibility: if possible per-table CSVs
  • More flexibility: Physical backups
  • Better recovery documentation: "one line to recover" (there is now a recover_section.py)
  • Faster point in time recovery/premade tools
  • Better compression
  • Prepare based on name of the backup, not just the section
  • Option optimization (e.g. double the use_memory)
  • More detailed health checks of backups (size, failures, objects, ...). E.g. check size is within a percentage of the previous backup.
  • Identify failures after X amount of timeout/time passed and easy cleanup of file leftover (probably on T205627)
  • Purge old metadata and make sure logs are rotated T205627
  • Review and improve logging (beyond metadata)
  • 1 retry after initial failure
  • More optimization of certain database tables
  • Maybe some kind of locking of backups and/or transfer.py to prevent concurrent actions on the same source or target servers
  • Incremental/Differential backups T244884
  • Document the last edit time (and potentially alert on) of some sample tables (e.g. recentchanges or revision) to verify the source databases are up to date (e.g. if its master, or intermediate master have replication stopped, or some other issue causing recent backups of stale data)
  • Have a quick way to see which backup sources belong to each section (tendril, dashboard) (Done on puppet's site.pp
  • Document and/or automate best server configuration for fast dump load (e.g. disable checksums, innodb transactionality, etc.)
  • Enable the possibility of editing per-table options such as the engine and compression
  • Workaround the "myloader doesn't import empty dbs" bug
  • Binlog backups
  • Easier/automated recovery/backup testing

Details

ProjectBranchLines +/-Subject
operations/puppetproduction+2 -2
operations/puppetproduction+82 -82
labs/privatemaster+1 -2
operations/puppetproduction+2 -2
operations/puppetproduction+24 -16
operations/software/wmfbackupsmaster+53 -5
operations/puppetproduction+7 -225
operations/puppetproduction+9 -462
operations/puppetproduction+8 -1 K
operations/puppetproduction+15 -1 K
operations/software/wmfbackupsmaster+4 -2
operations/software/wmfbackupsmaster+5 -7
operations/software/wmfbackupsmaster+1 -0
operations/software/wmfbackupsmaster+10 -3
operations/puppetproduction+10 -3
operations/software/wmfmariadbpymaster+10 -3
operations/software/wmfbackupsmaster+10 -3
operations/puppetproduction+6 -3
operations/puppetproduction+15 -19
operations/puppetproduction+0 -2
operations/software/wmfmariadbpymaster+2 -2
operations/puppetproduction+9 -9
operations/puppetproduction+3 -1
operations/puppetproduction+2 -662
operations/puppetproduction+2 -0
operations/puppetproduction+1 -1
operations/puppetproduction+2 -2
operations/softwaremaster+3 -3
operations/puppetproduction+29 -28
operations/puppetproduction+3 -2
operations/software/wmfmariadbpymaster+2 -2
operations/puppetproduction+2 -2
operations/puppetproduction+19 -9
operations/puppetproduction+63 -22
operations/puppetproduction+8 -10
operations/puppetproduction+2 -2
operations/puppetproduction+1 -1
operations/puppetproduction+1 -0
operations/puppetproduction+3 -1
operations/puppetproduction+1 -1
operations/puppetproduction+1 -1
operations/puppetproduction+1 -1
operations/puppetproduction+10 -10
operations/puppetproduction+8 -2
operations/puppetproduction+0 -2
operations/puppetproduction+13 -12
Show related patches Customize query in gerrit

Related Objects

StatusSubtypeAssignedTask
ResolvedRobH
Resolvedfgiunchedi
OpenNone
Resolvedjcrespo
Resolvedjcrespo
Resolvedjcrespo
Resolvedjcrespo
DeclinedNone
Resolvedjcrespo
Resolvedjcrespo
OpenNone
OpenNone
OpenNone
Resolvedjcrespo
ResolvedCmjohnson
ResolvedCmjohnson
ResolvedCmjohnson
Resolvedjcrespo
ResolvedMarostegui
ResolvedRobH
ResolvedAndrew
ResolvedCmjohnson
Resolvedjcrespo
ResolvedCmjohnson
ResolvedCmjohnson
Resolvedjcrespo
ResolvedCmjohnson
Resolvedjcrespo
ResolvedPapaul
ResolvedMarostegui
ResolvedRobH
ResolvedRobH
Resolvedjcrespo
Resolvedjcrespo
ResolvedNone
ResolvedMarostegui
ResolvedMarostegui
ResolvedMarostegui
DeclinedMarostegui
Resolvedmark
Resolvedjcrespo
Resolvedjcrespo
Resolvedjcrespo
OpenNone
Resolvedjcrespo
Resolvedjcrespo
Resolvedjcrespo
Resolvedjcrespo
Resolvedjcrespo
Resolvedjcrespo
DuplicateNone
OpenNone
OpenNone
OpenNone
OpenNone
OpenNone
ResolvedMarostegui
ResolvedMarostegui
ResolvedMarostegui
ResolvedMarostegui
ResolvedMarostegui
ResolvedMarostegui
ResolvedReedy
ResolvedMarostegui
ResolvedMarostegui
ResolvedMarostegui
DeclinedNone
ResolvedMarostegui
ResolvedMarostegui
ResolvedLadsgroup
ResolvedMarostegui
ResolvedMarostegui
ResolvedMarostegui
ResolvedMarostegui
ResolvedMarostegui
ResolvedMarostegui
ResolvedMarostegui
ResolvedLadsgroup
ResolvedMarostegui
ResolvedMarostegui
OpenNone
Resolvedjcrespo
Resolvedjcrespo
Resolvedjcrespo
Resolvedjcrespo
Resolvedjcrespo
Resolvedjcrespo
ResolvedPapaul
DeclinedNone
ResolvedGuozr.im
Resolvedjcrespo
ResolvedPapaul
Resolvedjcrespo
ResolvedPapaul
ResolvedCmjohnson
ResolvedPapaul
Resolvedjcrespo
Resolvedjcrespo
ResolvedL0st3xpl0r3r
Resolvedjcrespo
ResolvedPapaul
ResolvedCmjohnson
ResolvedPapaul
ResolvedCmjohnson
OpenNone

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Change 579894 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] mariadb-backups: Change bacula backup frequency of dbprov to weekly

https://gerrit.wikimedia.org/r/579894

Change 579894 merged by Jcrespo:
[operations/puppet@production] mariadb-backups: Change bacula backup frequency of dbprov to weekly

https://gerrit.wikimedia.org/r/579894

Change 579901 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] mariadb-backups: Change bacula backup frequency of dbprov to weekly

https://gerrit.wikimedia.org/r/579901

Change 579901 merged by Jcrespo:
[operations/puppet@production] mariadb-backups: Change bacula backup frequency of dbprov to weekly

https://gerrit.wikimedia.org/r/579901

Change 582791 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] mariadb-backups: Skip Saturday snapshot to prevent a retention of 4 copies

https://gerrit.wikimedia.org/r/582791

Change 582791 merged by Jcrespo:
[operations/puppet@production] mariadb-backups: Skip Saturday snapshot to prevent a retention of 4 copies

https://gerrit.wikimedia.org/r/582791

Change 583049 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] mariadb-backups: Stop replication for s3 when snapshotting

https://gerrit.wikimedia.org/r/583049

Change 583049 merged by Jcrespo:
[operations/puppet@production] mariadb-backups: Stop replication for s3 when snapshotting

https://gerrit.wikimedia.org/r/583049

Change 584599 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] partman: Reference bacula recipe (unused to prevent accidental formatting)

https://gerrit.wikimedia.org/r/584599

Change 584599 merged by Jcrespo:
[operations/puppet@production] partman: Reference bacula recipe (unused to prevent accidental formatting)

https://gerrit.wikimedia.org/r/584599

Change 588668 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] bacula: Increase max total size of Databases backups to 40 TB

https://gerrit.wikimedia.org/r/588668

Change 588668 merged by Jcrespo:
[operations/puppet@production] bacula: Increase max total size of Databases backups to 40 TB

https://gerrit.wikimedia.org/r/588668

Change 589266 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] mariadb-backups: Move all backup config templates to its own subdir

https://gerrit.wikimedia.org/r/589266

Change 589266 merged by Jcrespo:
[operations/puppet@production] mariadb-backups: Move all backup config templates to its own subdir

https://gerrit.wikimedia.org/r/589266

Change 591309 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] mariadb-backups: Move OK alert description to wikitech

https://gerrit.wikimedia.org/r/591309

Change 591309 merged by Jcrespo:
[operations/puppet@production] mariadb-backups: Move OK alert description to wikitech

https://gerrit.wikimedia.org/r/591309

Change 591326 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] mariadb-backups: Check backups size also based on previous runs

https://gerrit.wikimedia.org/r/591326

Change 591326 merged by Jcrespo:
[operations/puppet@production] mariadb-backups: Check backups size also based on previous runs

https://gerrit.wikimedia.org/r/591326

Change 592599 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] mariadb-backups: Tune backup size monitoring threasholds

https://gerrit.wikimedia.org/r/592599

Change 592599 merged by Jcrespo:
[operations/puppet@production] mariadb-backups: Tune backup size monitoring thresholds

https://gerrit.wikimedia.org/r/592599

jcrespo updated the task description. (Show Details)Apr 27 2020, 8:20 AM

Change 592608 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] mariadb-backups: Update transfer.py to HEAD

https://gerrit.wikimedia.org/r/592608

Change 592608 merged by Jcrespo:
[operations/puppet@production] mariadb-backups: Update transfer.py to HEAD

https://gerrit.wikimedia.org/r/592608

Change 594096 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/software/wmfmariadbpy@master] transfer.py: Set timeout to 5 minutes

https://gerrit.wikimedia.org/r/594096

Change 594096 merged by Jcrespo:
[operations/software/wmfmariadbpy@master] transfer.py: Set timeout to 5 minutes

https://gerrit.wikimedia.org/r/594096

Change 594099 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] mariadb-backups: Update transfer.py to HEAD

https://gerrit.wikimedia.org/r/594099

Change 594099 merged by Jcrespo:
[operations/puppet@production] mariadb-backups: Update transfer.py to HEAD

https://gerrit.wikimedia.org/r/594099

Change 596189 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] mariadb: Switchover zarcillo from db1115 to db2093

https://gerrit.wikimedia.org/r/596189

Change 596197 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/software@master] dbtools: point zarcillo database scripts to use db2093

https://gerrit.wikimedia.org/r/596197

Change 596189 merged by Jcrespo:
[operations/puppet@production] mariadb: Switchover zarcillo from db1115 to db2093

https://gerrit.wikimedia.org/r/596189

Change 596197 merged by Jcrespo:
[operations/software@master] dbtools: point zarcillo database scripts to use db2093

https://gerrit.wikimedia.org/r/596197

Change 596242 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] mariadb-backups: switch monitoring of tendril/zarcillo backups to codfw

https://gerrit.wikimedia.org/r/596242

Change 596242 merged by Jcrespo:
[operations/puppet@production] mariadb-backups: switch monitoring of tendril/zarcillo backups to codfw

https://gerrit.wikimedia.org/r/596242

Change 596251 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] mariadb-backups: Allow zarcillo as a valid backup section

https://gerrit.wikimedia.org/r/596251

Change 596251 merged by Jcrespo:
[operations/puppet@production] mariadb-backups: Allow zarcillo as a valid backup section

https://gerrit.wikimedia.org/r/596251

Change 597005 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] mariadb-backups: Disable monitoring screens on database backup hosts

https://gerrit.wikimedia.org/r/597005

Change 597005 merged by Jcrespo:
[operations/puppet@production] mariadb-backups: Disable monitoring screens on database backup hosts

https://gerrit.wikimedia.org/r/597005

Change 608053 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] mariadb-backups: Move transferpy deployment to debian package

https://gerrit.wikimedia.org/r/608053

Change 608053 merged by Jcrespo:
[operations/puppet@production] mariadb-backups: Move transferpy deployment to debian package

https://gerrit.wikimedia.org/r/c/operations/puppet/ /608053

Change 609101 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] mariadb-backups: Fix remote_backup_mariadb.py deps and path

https://gerrit.wikimedia.org/r/c/operations/puppet/ /609101

Change 609101 merged by Jcrespo:
[operations/puppet@production] mariadb-backups: Fix remote_backup_mariadb.py deps and path

https://gerrit.wikimedia.org/r/c/operations/puppet/ /609101

Change 615155 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] mariadb-backups: Adjust check parameters to get less false positives

https://gerrit.wikimedia.org/r/615155

Change 615155 merged by Jcrespo:
[operations/puppet@production] mariadb-backups: Adjust check parameters to get less false positives

https://gerrit.wikimedia.org/r/615155

Change 618720 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] mariadb-backups: Reenable notifications for dbprov2003 after maintenance

https://gerrit.wikimedia.org/r/618720

Change 618722 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] mariadb-backups: Move x1 and misc logical dumps to dbprov1003

https://gerrit.wikimedia.org/r/618722

Change 618723 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/software/wmfmariadbpy@master] BackupStatistics: Do not raise an exception if metadata cannot be sent

https://gerrit.wikimedia.org/r/618723

Change 618723 abandoned by Jcrespo:
[operations/software/wmfmariadbpy@master] BackupStatistics: Do not raise an exception if metadata cannot be sent

Reason:
not a bug

https://gerrit.wikimedia.org/r/618723

Change 618720 merged by Jcrespo:
[operations/puppet@production] mariadb-backups: Reenable notifications for dbprov2003 after maintenance

https://gerrit.wikimedia.org/r/618720

Change 618722 merged by Jcrespo:
[operations/puppet@production] mariadb-backups: Move x1 and misc logical dumps to dbprov1003

https://gerrit.wikimedia.org/r/618722

Change 620651 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] mariadb: Disable snapshots being sent to bacula

https://gerrit.wikimedia.org/r/620651

Change 620651 merged by Jcrespo:
[operations/puppet@production] mariadb: Disable snapshots being sent to bacula

https://gerrit.wikimedia.org/r/620651

Change 623525 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/software/wmfmariadbpy@master] mariadb-backups: Fix missing check on no ERROR msgs on logs

https://gerrit.wikimedia.org/r/623525

Change 623530 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/software/wmfbackups@master] mariadb-backups: Fix missing check on no ERROR msgs on logs

https://gerrit.wikimedia.org/r/623530

Change 623532 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/software/wmfbackups@master] mariadb-backups: Fix missing check on no ERROR msgs on logs

https://gerrit.wikimedia.org/r/623532

Change 623530 abandoned by Jcrespo:
[operations/software/wmfbackups@master] mariadb-backups: Fix missing check on no ERROR msgs on logs

Reason:
duplicate

https://gerrit.wikimedia.org/r/623530

Change 623525 abandoned by Jcrespo:
[operations/software/wmfmariadbpy@master] mariadb-backups: Fix missing check on no ERROR msgs on logs

Reason:
duplicate

https://gerrit.wikimedia.org/r/623525

Change 623538 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] mariadb-backups: Update backup logic to check errors on log

https://gerrit.wikimedia.org/r/623538

Change 623538 merged by Jcrespo:
[operations/puppet@production] mariadb-backups: Update backup logic to check errors on log

https://gerrit.wikimedia.org/r/623538

Change 623532 merged by jenkins-bot:
[operations/software/wmfbackups@master] mariadb-backups: Fix missing check on no ERROR msgs on logs

https://gerrit.wikimedia.org/r/623532

Change 626172 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/software/wmfbackups@master] remote_backup: Instead of using a preassigned port, autoselect one

https://gerrit.wikimedia.org/r/626172

jcrespo updated the task description. (Show Details)Sep 9 2020, 3:13 PM

Change 628163 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] mariadb: Stop using puppet to deploy wmfbackups and use debian packages

https://gerrit.wikimedia.org/r/628163

Change 628168 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/software/wmfbackups@master] cli: Make /etc/wmfbackups the config dir for the main backup scripts

https://gerrit.wikimedia.org/r/628168

Change 628172 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/software/wmfbackups@master] [WIP] Use the shared list of sections to validate backup checks

https://gerrit.wikimedia.org/r/628172

Change 626172 merged by Jcrespo:
[operations/software/wmfbackups@master] remote_backup: Instead of using a preassigned port, autoselect one

https://gerrit.wikimedia.org/r/626172

Change 628168 merged by Jcrespo:
[operations/software/wmfbackups@master] cli: Make /etc/wmfbackups the config dir for the main backup scripts

https://gerrit.wikimedia.org/r/628168

Change 629076 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] mariadb-backups: Start using the package instead of the script on puppet

https://gerrit.wikimedia.org/r/629076

Change 629076 merged by Jcrespo:
[operations/puppet@production] mariadb-backups: Start using the package instead of the script on puppet

https://gerrit.wikimedia.org/r/629076

Change 629092 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] mariadb-backups: Use the wmfbackups-remote package and not the puppet version

https://gerrit.wikimedia.org/r/629092

Change 628163 abandoned by Jcrespo:
[operations/puppet@production] mariadb: Stop using puppet to deploy wmfbackups and use debian packages

Reason:
split into 3 other patches, once per role

https://gerrit.wikimedia.org/r/628163

Change 629101 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] mariadb: Stop using puppet to deploy wmfbackups and use debian packages

https://gerrit.wikimedia.org/r/629101

Change 629101 merged by Jcrespo:
[operations/puppet@production] mariadb: Stop using puppet to deploy wmfbackups and use debian packages

https://gerrit.wikimedia.org/r/629101

Change 629092 merged by Jcrespo:
[operations/puppet@production] mariadb-backups: Use the wmfbackups-remote package and not the puppet version

https://gerrit.wikimedia.org/r/629092

jcrespo updated the task description. (Show Details)Sep 29 2020, 10:58 AM

Change 643220 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/software/wmfbackups@master] Add new option, stats_file, to prevent sending passwords over the network/logs

https://gerrit.wikimedia.org/r/643220

Change 643223 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] mariadb-backups: Make use of the new stats-file options, preventing password logging

https://gerrit.wikimedia.org/r/643223

Change 643300 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] Make sure mydumper and xtrabackup are installed on the right hosts

https://gerrit.wikimedia.org/r/643300

Change 643220 merged by Jcrespo:
[operations/software/wmfbackups@master] Add new option, stats_file, to prevent sending passwords over the network/logs

https://gerrit.wikimedia.org/r/643220

Change 643223 merged by Jcrespo:
[operations/puppet@production] mariadb-backups: Use new stats-file option, preventing password logging

https://gerrit.wikimedia.org/r/643223

Change 643300 merged by Jcrespo:
[operations/puppet@production] mariadb-backups: Make sure mydumper and xtrabackup are installed

https://gerrit.wikimedia.org/r/643300

jcrespo updated the task description. (Show Details)Mon, Feb 8, 1:33 PM
jcrespo added a subscriber: LSobanski.

Change 662740 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] [WIP] Move database backups-related puppet code to its own profile/role

https://gerrit.wikimedia.org/r/662740

Change 663221 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[labs/private@master] dbbackups: Update password locations for database-backups db

https://gerrit.wikimedia.org/r/663221

Change 663221 merged by Jcrespo:
[labs/private@master] dbbackups: Update password locations for database-backups db

https://gerrit.wikimedia.org/r/663221

Change 662740 merged by Jcrespo:
[operations/puppet@production] dbbackups: Move database backups-related puppet code to its own profile/role

https://gerrit.wikimedia.org/r/662740

Change 663649 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] dbbackups: Create new puppet module dbbackups, move backup check to it

https://gerrit.wikimedia.org/r/663649

Change 663649 merged by Jcrespo:
[operations/puppet@production] dbbackups: Create new puppet module dbbackups, move backup check to it

https://gerrit.wikimedia.org/r/663649