Page MenuHomePhabricator

Improve regular production database backups handling
Open, MediumPublic

Description

  • Documentation, documentation, documentation
  • More flexibility: if possible per-table CSVs
  • More flexibility: Physical backups
  • Better recovery documentation: "one line to recover" (there is now a recover_section.py)
  • Faster point in time recovery/premade tools
  • Better compression
  • Prepare based on name of the backup, not just the section
  • Option optimization (e.g. double the use_memory)
  • More detailed health checks of backups (size, failures, objects, ...). E.g. check size is within a percentage of the previous backup.
  • Identify failures after X amount of timeout/time passed and easy cleanup of file leftover (probably on T205627)
  • Purge old metadata and make sure logs are rotated T205627
  • Review and improve logging (beyond metadata)
  • 1 retry after initial failure
  • More optimization of certain database tables
  • Maybe some kind of locking of backups and/or transfer.py to prevent concurrent actions on the same source or target servers
  • Differential backups
  • Document the last edit time (and potentially alert on) of some sample tables (e.g. recentchanges or revision) to verify the source databases are up to date (e.g. if its master, or intermediate master have replication stopped, or some other issue causing recent backups of stale data)
  • Have a quick way to see which backup sources belong to each section (tendril, dashboard)
  • Document and/or automate best server configuration for fast dump load (e.g. disable checksums, innodb transactionality, etc.)
  • Enable the possibility of editing per-table options such as the engine and compression
  • Workaround the "myloader doesn't import empty dbs" bug

Details

ProjectBranchLines +/-Subject
operations/puppetproduction+15 -19
operations/puppetproduction+0 -2
operations/software/wmfmariadbpymaster+2 -2
operations/puppetproduction+9 -9
operations/puppetproduction+3 -1
operations/puppetproduction+2 -662
operations/puppetproduction+2 -0
operations/puppetproduction+1 -1
operations/puppetproduction+2 -2
operations/softwaremaster+3 -3
operations/puppetproduction+29 -28
operations/puppetproduction+3 -2
operations/software/wmfmariadbpymaster+2 -2
operations/puppetproduction+2 -2
operations/puppetproduction+19 -9
operations/puppetproduction+63 -22
operations/puppetproduction+8 -10
operations/puppetproduction+2 -2
operations/puppetproduction+1 -1
operations/puppetproduction+1 -0
operations/puppetproduction+3 -1
operations/puppetproduction+1 -1
operations/puppetproduction+1 -1
operations/puppetproduction+1 -1
operations/puppetproduction+10 -10
operations/puppetproduction+8 -2
operations/puppetproduction+0 -2
operations/puppetproduction+13 -12
Show related patches Customize query in gerrit

Related Objects

StatusSubtypeAssignedTask
ResolvedRobH
Resolvedfgiunchedi
OpenNone
Resolvedjcrespo
OpenNone
Resolvedjcrespo
Openjcrespo
StalledNone
DeclinedNone
Resolvedjcrespo
Resolvedjcrespo
OpenNone
OpenNone
OpenNone
Resolvedjcrespo
ResolvedCmjohnson
ResolvedCmjohnson
ResolvedCmjohnson
Resolvedjcrespo
Resolved Marostegui
ResolvedRobH
ResolvedAndrew
ResolvedCmjohnson
Resolvedjcrespo
ResolvedCmjohnson
ResolvedCmjohnson
Resolvedjcrespo
ResolvedCmjohnson
Resolvedjcrespo
ResolvedPapaul
Resolved Marostegui
ResolvedRobH
ResolvedRobH
Resolvedjcrespo
Resolvedjcrespo
ResolvedNone
Resolved Marostegui
Resolved Marostegui
Resolved Marostegui
Declined Marostegui
Resolvedmark
Resolvedjcrespo
Resolvedjcrespo
Resolvedjcrespo
OpenNone
Resolvedjcrespo
Resolvedjcrespo
Resolvedjcrespo
Resolvedjcrespo
Resolvedjcrespo
Resolvedjcrespo
DuplicateNone
OpenNone
OpenNone
OpenNone
OpenNone
OpenNone
Resolved Marostegui
Resolved Marostegui
Resolved Marostegui
Resolved Marostegui
Resolved Marostegui
Resolved Marostegui
ResolvedReedy
Resolved Marostegui
Resolved Marostegui
Resolved Marostegui
DeclinedNone
Resolved Marostegui
Resolved Marostegui
ResolvedLadsgroup
Resolved Marostegui
Resolved Marostegui
Resolved Marostegui
Resolved Marostegui
Resolved Marostegui
Stalled Marostegui
OpenNone
Resolvedjcrespo
Resolvedjcrespo
Resolvedjcrespo
Resolvedjcrespo
Resolvedjcrespo
Resolvedjcrespo
ResolvedPapaul
DeclinedNone
ResolvedGuozr.im
Resolvedjcrespo
ResolvedPapaul
Resolvedjcrespo
ResolvedPapaul
ResolvedCmjohnson
ResolvedPapaul
Resolvedjcrespo
Resolvedjcrespo
ResolvedL0st3xpl0r3r
Openjcrespo
ResolvedPapaul
OpenCmjohnson

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
jcrespo renamed this task from Improve db backup handling to Improve regular production database backups handling.Apr 12 2017, 10:26 AM
jcrespo raised the priority of this task from Low to Medium.
jcrespo moved this task from Backlog to Meta/Epic on the DBA board.
jcrespo updated the task description. (Show Details)
jcrespo updated the task description. (Show Details)Mar 9 2018, 3:16 PM
greg removed a subscriber: greg.Apr 11 2019, 9:20 PM
jcrespo updated the task description. (Show Details)Jun 25 2019, 8:58 AM
jcrespo updated the task description. (Show Details)Dec 2 2019, 10:03 AM
jcrespo updated the task description. (Show Details)Dec 10 2019, 3:13 PM
jcrespo updated the task description. (Show Details)

Change 573961 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] mariadb-backups: Update code to wmfbackup HEAD to fix stalling issue

https://gerrit.wikimedia.org/r/573961

Change 573961 merged by Jcrespo:
[operations/puppet@production] mariadb-backups: Update code to wmfbackup HEAD to fix stalling issue

https://gerrit.wikimedia.org/r/573961

Change 577462 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] mariadb-backups: Increase snapshot frequency and retention (6 days)

https://gerrit.wikimedia.org/r/577462

Change 577462 merged by Jcrespo:
[operations/puppet@production] mariadb-backups: Increase snapshot frequency and retain those on bacula

https://gerrit.wikimedia.org/r/577462

Change 579892 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] mariadb-backups: Move x1 backups from dbprov[12]001 to dbprov[12]002

https://gerrit.wikimedia.org/r/579892

Change 579892 merged by Jcrespo:
[operations/puppet@production] mariadb-backups: Move x1 backups from dbprov[12]001 to dbprov[12]002

https://gerrit.wikimedia.org/r/579892

Change 579894 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] mariadb-backups: Change bacula backup frequency of dbprov to weekly

https://gerrit.wikimedia.org/r/579894

Change 579894 merged by Jcrespo:
[operations/puppet@production] mariadb-backups: Change bacula backup frequency of dbprov to weekly

https://gerrit.wikimedia.org/r/579894

Change 579901 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] mariadb-backups: Change bacula backup frequency of dbprov to weekly

https://gerrit.wikimedia.org/r/579901

Change 579901 merged by Jcrespo:
[operations/puppet@production] mariadb-backups: Change bacula backup frequency of dbprov to weekly

https://gerrit.wikimedia.org/r/579901

Change 582791 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] mariadb-backups: Skip Saturday snapshot to prevent a retention of 4 copies

https://gerrit.wikimedia.org/r/582791

Change 582791 merged by Jcrespo:
[operations/puppet@production] mariadb-backups: Skip Saturday snapshot to prevent a retention of 4 copies

https://gerrit.wikimedia.org/r/582791

Change 583049 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] mariadb-backups: Stop replication for s3 when snapshotting

https://gerrit.wikimedia.org/r/583049

Change 583049 merged by Jcrespo:
[operations/puppet@production] mariadb-backups: Stop replication for s3 when snapshotting

https://gerrit.wikimedia.org/r/583049

Change 584599 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] partman: Reference bacula recipe (unused to prevent accidental formatting)

https://gerrit.wikimedia.org/r/584599

Change 584599 merged by Jcrespo:
[operations/puppet@production] partman: Reference bacula recipe (unused to prevent accidental formatting)

https://gerrit.wikimedia.org/r/584599

Change 588668 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] bacula: Increase max total size of Databases backups to 40 TB

https://gerrit.wikimedia.org/r/588668

Change 588668 merged by Jcrespo:
[operations/puppet@production] bacula: Increase max total size of Databases backups to 40 TB

https://gerrit.wikimedia.org/r/588668

Change 589266 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] mariadb-backups: Move all backup config templates to its own subdir

https://gerrit.wikimedia.org/r/589266

Change 589266 merged by Jcrespo:
[operations/puppet@production] mariadb-backups: Move all backup config templates to its own subdir

https://gerrit.wikimedia.org/r/589266

Change 591309 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] mariadb-backups: Move OK alert description to wikitech

https://gerrit.wikimedia.org/r/591309

Change 591309 merged by Jcrespo:
[operations/puppet@production] mariadb-backups: Move OK alert description to wikitech

https://gerrit.wikimedia.org/r/591309

Change 591326 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] mariadb-backups: Check backups size also based on previous runs

https://gerrit.wikimedia.org/r/591326

Change 591326 merged by Jcrespo:
[operations/puppet@production] mariadb-backups: Check backups size also based on previous runs

https://gerrit.wikimedia.org/r/591326

Change 592599 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] mariadb-backups: Tune backup size monitoring threasholds

https://gerrit.wikimedia.org/r/592599

Change 592599 merged by Jcrespo:
[operations/puppet@production] mariadb-backups: Tune backup size monitoring thresholds

https://gerrit.wikimedia.org/r/592599

jcrespo updated the task description. (Show Details)Apr 27 2020, 8:20 AM

Change 592608 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] mariadb-backups: Update transfer.py to HEAD

https://gerrit.wikimedia.org/r/592608

Change 592608 merged by Jcrespo:
[operations/puppet@production] mariadb-backups: Update transfer.py to HEAD

https://gerrit.wikimedia.org/r/592608

Change 594096 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/software/wmfmariadbpy@master] transfer.py: Set timeout to 5 minutes

https://gerrit.wikimedia.org/r/594096

Change 594096 merged by Jcrespo:
[operations/software/wmfmariadbpy@master] transfer.py: Set timeout to 5 minutes

https://gerrit.wikimedia.org/r/594096

Change 594099 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] mariadb-backups: Update transfer.py to HEAD

https://gerrit.wikimedia.org/r/594099

Change 594099 merged by Jcrespo:
[operations/puppet@production] mariadb-backups: Update transfer.py to HEAD

https://gerrit.wikimedia.org/r/594099

Change 596189 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] mariadb: Switchover zarcillo from db1115 to db2093

https://gerrit.wikimedia.org/r/596189

Change 596197 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/software@master] dbtools: point zarcillo database scripts to use db2093

https://gerrit.wikimedia.org/r/596197

Change 596189 merged by Jcrespo:
[operations/puppet@production] mariadb: Switchover zarcillo from db1115 to db2093

https://gerrit.wikimedia.org/r/596189

Change 596197 merged by Jcrespo:
[operations/software@master] dbtools: point zarcillo database scripts to use db2093

https://gerrit.wikimedia.org/r/596197

Change 596242 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] mariadb-backups: switch monitoring of tendril/zarcillo backups to codfw

https://gerrit.wikimedia.org/r/596242

Change 596242 merged by Jcrespo:
[operations/puppet@production] mariadb-backups: switch monitoring of tendril/zarcillo backups to codfw

https://gerrit.wikimedia.org/r/596242

Change 596251 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] mariadb-backups: Allow zarcillo as a valid backup section

https://gerrit.wikimedia.org/r/596251

Change 596251 merged by Jcrespo:
[operations/puppet@production] mariadb-backups: Allow zarcillo as a valid backup section

https://gerrit.wikimedia.org/r/596251

Change 597005 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] mariadb-backups: Disable monitoring screens on database backup hosts

https://gerrit.wikimedia.org/r/597005

Change 597005 merged by Jcrespo:
[operations/puppet@production] mariadb-backups: Disable monitoring screens on database backup hosts

https://gerrit.wikimedia.org/r/597005

Change 608053 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] mariadb-backups: Move transferpy deployment to debian package

https://gerrit.wikimedia.org/r/608053

Change 608053 merged by Jcrespo:
[operations/puppet@production] mariadb-backups: Move transferpy deployment to debian package

https://gerrit.wikimedia.org/r/c/operations/puppet/ /608053

Change 609101 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] mariadb-backups: Fix remote_backup_mariadb.py deps and path

https://gerrit.wikimedia.org/r/c/operations/puppet/ /609101

Change 609101 merged by Jcrespo:
[operations/puppet@production] mariadb-backups: Fix remote_backup_mariadb.py deps and path

https://gerrit.wikimedia.org/r/c/operations/puppet/ /609101

Change 615155 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] mariadb-backups: Adjust check parameters to get less false positives

https://gerrit.wikimedia.org/r/615155

Change 615155 merged by Jcrespo:
[operations/puppet@production] mariadb-backups: Adjust check parameters to get less false positives

https://gerrit.wikimedia.org/r/615155

Change 618720 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] mariadb-backups: Reenable notifications for dbprov2003 after maintenance

https://gerrit.wikimedia.org/r/618720

Change 618722 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] mariadb-backups: Move x1 and misc logical dumps to dbprov1003

https://gerrit.wikimedia.org/r/618722

Change 618723 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/software/wmfmariadbpy@master] BackupStatistics: Do not raise an exception if metadata cannot be sent

https://gerrit.wikimedia.org/r/618723

Change 618723 abandoned by Jcrespo:
[operations/software/wmfmariadbpy@master] BackupStatistics: Do not raise an exception if metadata cannot be sent

Reason:
not a bug

https://gerrit.wikimedia.org/r/618723

Change 618720 merged by Jcrespo:
[operations/puppet@production] mariadb-backups: Reenable notifications for dbprov2003 after maintenance

https://gerrit.wikimedia.org/r/618720

Change 618722 merged by Jcrespo:
[operations/puppet@production] mariadb-backups: Move x1 and misc logical dumps to dbprov1003

https://gerrit.wikimedia.org/r/618722