Page MenuHomePhabricator

Process for user backups
Closed, DeclinedPublic

Description

The basic requirement is to allow Labs endusers to recover past versions of files/databases for disaster and error recovery.

This used to work with timetravel snapshots, but that feature was disable when it was suspected to cause instabilities late 2013, and has remained disabled since. The true culprit having been found, reenabling it should be the first layer of redundancy.

In addition, replication of hourly snapshots between the DCs will provide for an cross-site backups.

As this covers only the filesystems, a mechanism needs to be devised by which users can also make safety backups of databases. In practice, a simple mechanism allowing scheduled dumps of the database to the filesystem will allow for the latter to handle redundancy.

Event Timeline

coren raised the priority of this task from to Needs Triage.
coren updated the task description. (Show Details)
coren added a project: Cloud-Services.
coren subscribed.
coren set Security to None.
coren triaged this task as Medium priority.Feb 10 2015, 9:34 PM
coren moved this task from Triage to Stalled on the Cloud-Services board.

Bugs in LVM2 make thin snapshots iffy, I'm backporting lvm2 back to Precise to fix.
(WIP at https://launchpad.net/~marc-u/+archive/ubuntu/wmf/+packages)

Jessie saves. Snapshots are back, and working, but not yet user-accessible (design work will be needed, perhaps automount?) At the very least, once we turn the feature on, admins can recover user files.

This is now working on the new (not live) filesystem. We are pending only the switch.

Change 199267 had a related patch set uploaded (by coren):
WIP: Proper labs_storage class

https://gerrit.wikimedia.org/r/199267

Change 199267 abandoned by coren:
WIP: Proper labs_storage class

Reason:
Superseeded by https://gerrit.wikimedia.org/r/220618

https://gerrit.wikimedia.org/r/199267

yuvipanda claimed this task.
yuvipanda subscribed.

I do not think we should offer time travel backups that are user recoverable, at least for the time. Let's work on getting actual DR backups running and improving the performance of the system first, and for now users can always recover files by asking an admin (who can look in the latest snapshot if we have continuous backups running).

We can revisit this at a later point if desired.