Page MenuHomePhabricator

Implement a backup system for the data stored for Grant Metrics
Closed, ResolvedPublic3 Estimated Story Points

Description

Value proposition (why do we need to do this)?

As a manager, I want to ensure that our users data is not lost or corrupted. As we invest more into this application, we should be more rigorous about its operational methods.

Functionality/software changes
  • Create a cron job to dump the database to disk. Should run once a day. Files should be stored to NFS for redundancy.
  • Optional: send an email or some kind of notice after each successful backup (depends on how often the backup is taken). Redirect the cron job output to /dev/null so we can only output if there's an error
User interface changes

None

QA/Testing
  • After deployment, ensure that the files are valid and that the cron job completes regularly. Restore the data to an empty DB to validate the dump.

Event Timeline

aezell set the point value for this task to 3.Sep 5 2018, 11:08 PM
aezell added a subscriber: Niharika.

@Niharika Can you prioritize this one?

The first part of this (dumping the database daily to disk) is done. Documented here: https://wikitech.wikimedia.org/wiki/Tool:Grant_Metrics#Backups

What happens if the disk gets full? Is that possible? I don't know how storage is done in this environment.

Niharika triaged this task as Medium priority.Sep 7 2018, 3:27 PM

Are we also going to periodically delete old backups?

I was hoping that using logrotate like this would work, because it handles keeping a set number of copies of the dump and deletes old ones when they drop off the end. However:

jsub: error: argument program: Program 'logrotate' not found.

So I guess that's out (at least until we move to a VPS). :(

As for disk space, we're talking about such small amounts of data that I don't think we'll run into a problem. The backup at the moment is 40KB.

I'll fix up the backup script to not use logrotate.

Okay, it now keeps 30 days of dumps and doesn't use logrotate. I've updated the docs. Still to do:

  • give developers an easy way to pull the latest backup into their local DB.
  • mail the ~/backup.err file (which is the stderr output of the script) to maintainers (at the moment, all we'll get is "job submitted successfully").

The backup report email will now be sent every day, listing the existing backups and their sizes.

I'm not quite sure how to handle copying the dumps though. If we don't mind handing these out to developers then we assume they're completely public info — but is that correct? Is there any worry with handing out participant lists? It seems like there might be, especially as things grow and we add other information.

What do you think of the idea of restoring to the staging site? Then it'd have real data in it, and would also verify the backups.

I think Participant lists could potentially be seen as non-public but not private. They could potentially be posted a wiki page so it's definitely not private. That said, I relational data set such that one could query, "Show me all events AEzell has participated in" collates data in a way that a myriad of postings across wiki pages doesn't.

With that in mind, I think restoring to staging is a good idea but maybe putting this data on dev machines is something we'd want to consider only if we really need it. And then, we might want to do something about scrubbing or anonymizing some of the data.

Okay, it's now restoring to the staging DB (daily). We'll get emailed any stderr output.

I think that's all to do here; we can say we've got a backup system?

Niharika moved this task from QA to Q1 2018-19 on the Community-Tech-Sprint board.