Page MenuHomePhabricator

Define priorities for HDFS data to be backed up
Open, HighPublic

Description

For the moment HDFS is only available on eqiad, putting us at risk of losing data. This task is about documenting priorities of datasets to be backed up. With that we should be able to better plan on how much we'll need depending on what we wish to save.

Event Timeline

LSobanski triaged this task as Medium priority.May 20 2021, 5:12 PM
LSobanski moved this task from Triage to Refine on the Data-Persistence-Backup board.

My understanding based on recent conversations was that a decision is yet to be made about what approach to take with HDFS backup / redundancy. Is that still the case and is this a discovery task or do we have an answer and this is an implementation task? I'm trying to figure out what is the expectation of involvement from our side.

@LSobanski You're absolutely right, this task is about documenting on our end the priorities and sizes of datasets to be backed up so that we can be better inform next steps (including potential implementation) later.

@JAllemandou I'd remove Data Persistence from this task, with the team's choices we'll not get any storage for the next fiscal year (so this task may be confusing for other teams).

odimitrijevic raised the priority of this task from Medium to High.

We have a template spreedsheet of some SRE-maintained datasets so we can keep track of its current properties and state. Would that be a useful tool for you to classify your maintained datasets?