Backup systems (tracking)
Closed, ResolvedPublic

Description

Author: elubarsky

Description:
I think there should be a bug to track updates to the docs and eventual implementations.

IMHO we need to make sure text & images are protected from hardware faults, accidental software errors (which can replicate..), and even malicious deletion/corruption by someone gaining access through a security breach.


Version: unspecified
Severity: major
URL: http://wikitech.wikimedia.org/view/Backup_procedures

bzimport added a subscriber: wikibugs-l.
bzimport set Reference to bz18255.
bzimport created this task.Via LegacyMar 30 2009, 12:40 PM
brion added a comment.Via ConduitApr 22 2009, 4:48 PM

Assigning fun sysadmin task to Fred, adding CCs for Rob and Tomasz.

The backup procedures/status chart at http://wikitech.wikimedia.org/view/Backup_procedures needs to be audited; it hasn't been updated in several months. Some services have been moved to new servers, and automatic backup jobs might not have been updated; others were last seen still pending full offsite backups, or had only ad-hoc procedures.

We want to make sure all the empty or red spots are filled in and cleaned up, and that we have a good idea how to recover from loss of any one of these services.

Tfinc added a comment.Via ConduitApr 22 2009, 5:09 PM

I'd love to have a status page for the backups that's updated regularly. Should be easy enough with a call to the mediawiki write api upon success or failure.

brion added a comment.Via ConduitApr 22 2009, 5:22 PM

Amen, brother!

(Should this also be integrated to Nagios monitoring/alerts?)

bzimport added a comment.Via ConduitApr 29 2009, 5:49 PM

fvassard wrote:

I am currently evaluating a free solution called Amanda to properly take care of backups.
I believe this solution would also offer easy notification of successes / failures and enable us to have the dashboard Tomasz mentioned.
Also 'waiting' on storage to be available, which should theoretically arrive once I have had enough time to work through Amanda's configuration setup.
More to come!

--Fred.

bzimport added a comment.Via ConduitJul 5 2009, 5:04 AM

elubarsky wrote:

Considering how valuable and high-quality this data is (and the massive effort the community has put into it), I'm rather concerned that there haven't been many concrete updates to this bug and pretty much none at all to http://wikitech.wikimedia.org/view/Backup_procedures.. From that page, images and OTRS data seem to be quite vulnerable even though they're are obviously very important to the project. Could we at least have some re-assurance?

werdna added a comment.Via ConduitJul 29 2009, 4:39 PM

Adding 'tracking' keyword.

MarkAHershberger added a comment.Via ConduitMar 6 2011, 9:27 PM

Giving half of Fred's old bugs to Ashar since I trust him to get it done or reassign if he doesn't have time.

hashar added a comment.Via ConduitMar 10 2011, 12:29 PM

hexmode>
Tasks coming to mind:

  • which item needs to be backuped this bug report provides some
  • identify for each item:
    • a backup solution (rsync, ftp, NAS, ..), incremental, snapshots, both?
    • off-site, off-foundation requirements
    • backup frequency (daily, weekly, monthly).
    • retention duration (month, year ...)
  • make sure the backup tasks are documented and monitored
  • make sure operations know where the documents are and train them
  • actually test backup procedures and data from time to time on a test server. This is to ensure the tools and procedures are up-to date. There is nothing worse than a bad backup (data filled with zeroes for examples) or wasting 2 hours looking for the documentation / hacking scripts around.

I believe this should be raised to CT woo and assigned to an operation program manager.

Wikitech has some documentations:
https://wikitech.wikimedia.org/view/Backup_procedures

hashar added a comment.Via ConduitMay 21 2011, 10:05 AM

content hidden as private in Bugzilla

hashar added a comment.Via ConduitOct 17 2011, 12:43 PM

Reseting this bug to default assignee. There is not really anything I could do, that is an operation team issue.

Aklapper added a comment.Via ConduitNov 10 2012, 4:47 PM

Not sure why this report was marked as "tracking" if nothing has been ever tracked here (no dependency bug reports), plus scope unclear: What would be needed to get this fixed? Sounds rather like a continuous task (not fixable).

Nemo_bis added a comment.Via ConduitNov 10 2012, 6:41 PM

(In reply to comment #11)

Not sure why this report was marked as "tracking" if nothing has been ever
tracked here (no dependency bug reports),

Probably, because we never found someone bothering to create/track the relevant subtasks.

hashar added a comment.Via ConduitMar 16 2013, 8:51 PM

Backup system is not tracked in Bugzilla anymore. Ops team use RT and has different set of tool to plan their backup strategy, this bug report is hence rather useless and I am closing it.

bzimport added a comment.Via ConduitMar 17 2013, 6:23 AM

elubarsky wrote:

Hi Antoine, could you post a link to where this is now tracked?

Aklapper added a comment.Via ConduitMar 17 2013, 11:32 AM

This report was meant as a "tracking" report, but instead random different requests were added in this bug report, instead of marking them as dependencies. So to me it's already unclear what you'd actually like to "track".

FYI, RT is located at https://rt.wikimedia.org/ (refer to bug 30413 for potential meta-discussions about RT itself).

bzimport added a comment.Via ConduitMar 17 2013, 12:19 PM

elubarsky wrote:

A few years ago the situation at https://wikitech.wikimedia.org/wiki/Backup_procedures wasn't nearly as good as it is now. In particular there weren't proper disconnected off-site backups for text & especially images. So I started this report "to make sure text & images are protected from hardware faults, accidental software errors (which can replicate..), and even malicious deletion/corruption by someone gaining access through a security breach."

It's great to see that much progress has been made (thanks to everyone at Wikimedia for that!), but would you say that the risks in my original concern are now minimised as much as possible? I think matters relating to the progress of backup systems should be kept public since they are obviously of concern to Wikipedia editors who'd like their considerable efforts to not get lost.

Add Comment

Column Prototype
This is a very early prototype of a persistent column. It is not expected to work yet, and leaving it open will activate other new features which will break things. Press "\" (backslash) on your keyboard to close it now.