For the moment HDFS is only available on eqiad, putting us at risk of losing data. This task is about documenting priorities of datasets to be backed up. With that we should be able to better plan on how much we'll need depending on what we wish to save.
Description
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Open | None | T277015 Evaluate possible solutions to backup Analytics Hadoop's HDFS data | |||
Open | None | T283261 Define priorities for HDFS data to be backed up |
Event Timeline
My understanding based on recent conversations was that a decision is yet to be made about what approach to take with HDFS backup / redundancy. Is that still the case and is this a discovery task or do we have an answer and this is an implementation task? I'm trying to figure out what is the expectation of involvement from our side.
@LSobanski You're absolutely right, this task is about documenting on our end the priorities and sizes of datasets to be backed up so that we can be better inform next steps (including potential implementation) later.
@JAllemandou I'd remove Data Persistence from this task, with the team's choices we'll not get any storage for the next fiscal year (so this task may be confusing for other teams).
We have a template spreedsheet of some SRE-maintained datasets so we can keep track of its current properties and state. Would that be a useful tool for you to classify your maintained datasets?