Page MenuHomePhabricator

Re-evaluate Fundraising database backup retention
Closed, ResolvedPublic

Description

We need to adjust our database backup strategy to be able to comply with GDPR and other donor privacy standards. The current scheme makes it virtually impossible to delete specific data from backups, which is a requirement triggered for example by GDPR right-to-be-forgotten requests.

Current Backup Scheme

  • nightly full mysqldump from each replica db
  • central collection of dumps to Fundraising archive server
  • retention scheme gradually fades away backups as they age
    • 0-7 days: keep all files (including redundant copies dumped separately)
    • 8-30 days: keep one file per day
    • 31-180 days: keep only the 1st/15th of the month
    • 181 days - 5 years: keep only 1st of the month

Use Cases vs Backup Timeframes

  1. Hardware/software failure We use the latest viable backup if ~all~ database instances failed. The value of anything over a day is as a hedge against problems with the backup process.
  2. Data loss or corruption We would use the latest unaffected backup. The longer we retain backups, the longer the window of time we have to discover data loss before we lose it from backups too.
  3. Forensics In theory we would examine earlier versions of the database to discover where an event/bug occured. With the current backup strategy we have pretty good data for 30 days, then it becomes very gappy for the remainder of the 5 years we retain data. But it takes a database server 50+ hours to restore a dump, so the feasibility of using historical snapshots is poor.
  4. QA testing We always use the latest backup, but we can also clone a running database.
  5. Studying Growth Trends We did this once to fill in the gaps before we automated collection of the relevant data.

Recommendations

  • Establish a 30 day minimum retention standard This is the number of days for which we do not delete or tamper with backups. This is decent coverage for data recovery, essentially our current standard.
  • Stop retaining 1st/15th backups These only have value for forensics, and even there the value is very limited. They are a liability for GDPR etc. because there's no practical way to selectively delete data from them.
  • Align data delete compliance policy to our minimum retention standard For example "Data deleted from Civicrm will be removed from backups within 30 days"

Optional
In general the longer we retain backups, the more time we have to discover data loss/corruption before we lose data permanently. But there are diminishing returns as the backups age. Deciding the exact point where we should delete is a guessing game. We could establish a separate, longer, maximum retention policy and automate deletes to that longer standard. Backups aged between the min/max standards would be treated as "nice to have" and manually deleted by protocol any time we get a data delete request. From an operations perspective this is very easy to implement, however we would need to be notified after data is deleted from Civicrm.

Related Objects

StatusSubtypeAssignedTask
OpenNone
ResolvedJgreen

Event Timeline

Jgreen triaged this task as Medium priority.
Jgreen moved this task from In Progress to Done on the fundraising-tech-ops board.

We discussed this in 2022-04-02 the FR Tech meeting and agreed that it these recommendations are ok. The backup purge schedule has been adjusted to keep daily backups for 90 days, and we'll do a one-time 30 manual day purge to clear the backups for T306192.