Page MenuHomePhabricator

Make sure tools-db is backed up in some form
Closed, DuplicatePublic

Description

I think prod databases are, so this should be too. Delayed replication and / or dumps?

Event Timeline

yuvipanda assigned this task to coren.
yuvipanda raised the priority of this task from to High.
yuvipanda updated the task description. (Show Details)
yuvipanda added a project: Cloud-Services.
yuvipanda added subscribers: faidon, mark, Aklapper, yuvipanda.

(Presumably, back up offsite)

This requires one of two things: either we dump the database to labstore2001 (which also gets rsyncs of labstores) or we add a DB to codfw and slave it.

What is the purpose of the backups? If it is to guard against hardware failures & Co., I assume some replication to another server that can be pointed to by tools-db would be the best way (and this task thus a duplicate of T88718).

Hinting that user databases are backed up probably provokes support requests that someone wants to restore a table row in the form it had 39.437 days ago. In addition, I assume much (most?) of the data in the user databases is derived from replicated data and could thus be regenerated. So for "user database backups" in that sense I would instead recommend advertising that those users who need that should set up a cron job that calls mysqldump tailored to their requirements.

@scfc: This is DR backups, not partially restorable backups.

Once T88718 is finished (most of the work has been done already), backups can be taken from the slave consistently.

A slave replica will prevent against:

  • Hardware issues (e.g. secondary storage broken, server fried in general)
  • Admin and security issues (data is rm'ed accidentally/by an attacker)

Pointing clients to the slave is trivial, and could even done automatically.

A slave replica will not prevent against:

  • Software logic (a bad SQL command is executed, and data is logically DROPed or MySQL has a bug which creates data loss).

For that, there are 2 options:

  • Regular periodic backups, that will allow to go back that period
  • Delayed slave, where in the event of a bad SQL command, there will be X amount of hours until the slave executes it.

Problem with backups in that host is that there are some largish databases that are directly derived from mysql production replicas, and not worth recovering. I would suggest doing a user poll/selecting specific databases to avoid duplicating 500GB.

@scfc: This is DR backups, not partially restorable backups.

and I'd repeat my comment T88716#1181437:

[…]
Hinting that user databases are backed up probably provokes support requests that someone wants to restore a table row in the form it had 39.437 days ago. In addition, I assume much (most?) of the data in the user databases is derived from replicated data and could thus be regenerated. So for "user database backups" in that sense I would instead recommend advertising that those users who need that should set up a cron job that calls mysqldump tailored to their requirements.

I can imagine constellations where some form of point-in-time recovery is useful from a DBA perspective (imagine a maintenance script that is pointed to the wrong DB host and drops all databases), but IMHO users should be advised to mysqldump tables they think are important at intervals they deem appropriate. So if you think that replication guards against hardware issues and a DB root accidentally dropping databases is unlikely enough, I would consider this task resolved (or a duplicate of T88718).

Just a comment: I would make sure to announce the no-recovery guarantee to labs list (and intend to).