Page MenuHomePhabricator

Back up of Commons files
Open, NormalPublic

Description

Proposed in Community-Wishlist-Survey-2016. Received 28 support votes, and ranked #54 out of 265 proposals. View full proposal with discussion and votes here.

Problem

Because of various software bugs, misconfiguration or software interactions sometimes various files are lost from Wikimedia Commons. Sometimes they are restored later, but generally after a long, unpredictable period of time. In many cases they are never restored. Sometimes the files seem to be permanently lost or just nobody knows how they can be restored. In many cases it is not easy to reupload them from other sources as the files were modified/created just for use in other Wikimedia wikis and are not stored elsewhere.

Who would benefit

Wikimedia Commons (and other wikis) users who use the files. They will find Wikimedia Commons as more reliable file storage.

Proposed solution

Create a continuous backup of all uploaded files that would allow file restoring by devs in a predictable period of time (few days? a week?) on community requests.

Technical details

Time, expertise and skills required

  • e.g. 2-3 weeks, advanced contributor, javascript, css, etc

Suitable for

  • e.g. Hackathon, GSOC, Outreachy, etc.

Proposer

Ankry

Related links

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMar 11 2017, 1:59 AM
Poyekhali renamed this task from Back up of common files to Back up of Commons files.Mar 11 2017, 2:37 AM
Poyekhali added a subscriber: Poyekhali.
Peachey88 updated the task description. (Show Details)Mar 11 2017, 11:53 PM
Zppix added a subscriber: Zppix.Mar 12 2017, 12:17 AM

I'd be willing to help with backing it up

I have no idea what this request wants from ArchiveTeam/WikiTeam that we aren't already doing (http://archiveteam.org/index.php?title=Wikimedia_Commons ), so I'm removing our tag.

For the WMF side of things, I guess it would be useful to expand/update https://wikitech.wikimedia.org/wiki/Bacula to clarify what is actually being backed up, how easy it is to recover data from the backups and how the backups can be expanded to cover more things (if uploads aren't covered yet).

Seeing bacula being mentioned (I indeed should try to update the wikitech page, although not many things have changed), I just want to point out that bacula is not designed for this kind of thing, for multiple reasons. A few are:

  • The design model of bacula which focuses on the backing up of file on a predetermined time based schedule.
  • The point in time nature of these backups.
  • The fact that we don't have in the current infrastructure enough space to handle this.

So the bacula infrastructure is not suitable for this.

I am interested a bit on this statement Because of various software bugs, misconfiguration or software interactions. Are these issues identified in the "server" side of things or are we talking about client-side bugs. For the former (the 4 linked ones in the description of this task fall in this category), we should probably fix these issues, for the later, aside from the accidental deletion case, I am failing to see how server side backups would help much.

Are these issues identified in the "server" side of things

Yes, mostly issues with Swift.

Dzahn added a subscriber: Dzahn.EditedMar 16 2017, 4:49 PM

I am wondering how is this task related to "skills required: javascript, css, etc" at all?

I am wondering how is this task related to "skills required: javascript, css, etc" at all?

Apparently this task was filed according to a standard format under the assumption that some volunteer or intern could potentially do something about it. If it's in WMF operations realm, then the task description should be adapted.

Joe added a subscriber: Joe.Mar 19 2017, 10:05 AM

I am wondering how is this task related to "skills required: javascript, css, etc" at all?

Apparently this task was filed according to a standard format under the assumption that some volunteer or intern could potentially do something about it. If it's in WMF operations realm, then the task description should be adapted.

I agree.

Creating a backup of commons is a huge infrastructural task that would require significant commitment in terms of design and hardware. It would likely be not just operations, but a significant develpment time too, surely not a hackathon-like project at all.

Also, before anyone embarks in such a project, I think we should seek hard data about file losses, their causes, and all other failures. No, this doesn't mean searching for past tickets, but creating some more refined monitoring of our file storage and retrieval system, and is in itself not a small task.

And finally: we should concentrate on fixing the related bugs instead of relying on a complex, expensive backup system (that will have its own bugs, failures and inconsistencies as well) to overcome those.

(In terms of pure data redundancy, we do have a synchronized copy of all of our swift data in the non-active datacenter as well, but that means we can expect files to be ~ 1:1 between copies, and file losses to be propagated if they're due to some mediawiki bug).

fgiunchedi triaged this task as Normal priority.Apr 12 2017, 8:00 AM

Like to know what is the current backup policy for commons, total storage size, and the architecture of the servers/storage.

Can anyone share the info here or point the relevant links?