Page MenuHomePhabricator

Improve regular production database backups handling
Open, NormalPublic

Description

  • Documentation, documentation, documentation
  • More flexibility: if possible per-table CSVs
  • More flexibility: Physical backups
  • Better recovery documentation: "one line to recover" (there is now a recover_section.py)
  • Faster point in time recovery/premade tools
  • Better compression
  • Prepare based on name of the backup, not just the section
  • Option optimization (e.g. double the use_memory)
  • Identify failures after X amount of timeout/time passed and easy cleanup of file leftover (probably on T205627)
  • Purge old metadata and make sure logs are rotated T205627
  • Review and improve logging (beyond metadata)
  • 1 retry after initial failure
  • More optimization of certain database tables
  • Maybe some kind of locking of backups and/or transfer.py to prevent concurrent actions on the same source or target servers
  • Differential backups
  • More detailed health checks of backups (size, failures, objects, ...). E.g. check size is within a percentage of the previous backup.
  • Have a quick way to see which backup sources belong to each section (tendril, dashboard)

Related Objects

StatusAssignedTask
OpenNone
Resolvedfgiunchedi
OpenNone
Resolvedjcrespo
OpenNone
Resolvedjcrespo
OpenNone
DeclinedNone
Resolvedjcrespo
Resolvedjcrespo
OpenNone
OpenNone
OpenNone
Resolvedjcrespo
ResolvedCmjohnson
ResolvedCmjohnson
ResolvedCmjohnson
Resolvedjcrespo
ResolvedMarostegui
ResolvedRobH
ResolvedAndrew
ResolvedCmjohnson
Resolvedjcrespo
ResolvedCmjohnson
ResolvedCmjohnson
Resolvedjcrespo
ResolvedCmjohnson
Resolvedjcrespo
ResolvedPapaul
ResolvedMarostegui
ResolvedRobH
ResolvedRobH
Resolvedjcrespo
Resolvedjcrespo
ResolvedNone
ResolvedMarostegui
ResolvedMarostegui
ResolvedMarostegui
DeclinedMarostegui
Resolvedmark
Resolvedjcrespo
Resolvedjcrespo
Resolvedjcrespo
OpenRduran
Resolvedjcrespo
Resolvedjcrespo
Resolvedjcrespo
Resolvedjcrespo
Resolvedjcrespo
Resolvedjcrespo
DuplicateNone
OpenNone
OpenNone
OpenNone
OpenNone
OpenNone
ResolvedMarostegui
ResolvedMarostegui
Resolvedjcrespo
Resolvedjcrespo
Resolvedjcrespo
Resolvedjcrespo
Resolvedjcrespo
Resolvedjcrespo
ResolvedPapaul
DeclinedNone
OpenNone
Resolvedjcrespo
OpenRobH
Resolvedjcrespo
ResolvedPapaul
ResolvedCmjohnson
ResolvedPapaul
Resolvedjcrespo

Event Timeline

jcrespo created this task.Jun 24 2016, 9:00 AM
Restricted Application added subscribers: Zppix, Aklapper. · View Herald TranscriptJun 24 2016, 9:00 AM

The is to improve handling of cases such as T138516

jcrespo moved this task from Triage to Backlog on the DBA board.Jun 24 2016, 2:49 PM

Change 296706 had a related patch set uploaded (by Jcrespo):
Correct invalid cron definition; add gtid to backups

https://gerrit.wikimedia.org/r/296706

Change 296706 merged by Jcrespo:
Correct invalid cron definition; add gtid to backups

https://gerrit.wikimedia.org/r/296706

greg added a subscriber: greg.Sep 29 2016, 7:40 PM

This follow-up task from an incident report has not been updated recently. If it is no longer valid, please add a comment explaining why. If it is still valid, please prioritize it appropriately relative to your other work. If you have any questions, feel free to ask me (Greg Grossmeier).

jcrespo triaged this task as Low priority.Sep 30 2016, 8:24 AM
jcrespo renamed this task from Improve db backup handling, specially of misc hosts to Improve db backup handling.Apr 12 2017, 10:21 AM
jcrespo removed a project: Patch-For-Review.
jcrespo renamed this task from Improve db backup handling to Improve regular production database backups handling.Apr 12 2017, 10:26 AM
jcrespo raised the priority of this task from Low to Normal.
jcrespo moved this task from Backlog to Meta/Epic on the DBA board.
jcrespo updated the task description. (Show Details)
jcrespo updated the task description. (Show Details)Mar 9 2018, 3:16 PM
greg removed a subscriber: greg.Apr 11 2019, 9:20 PM
jcrespo updated the task description. (Show Details)Jun 25 2019, 8:58 AM