Define priorities for HDFS data to be backed up
Open, HighPublic
Actions

Assigned To

None

Authored By

	JAllemandou
	May 20 2021, 5:08 PM

Description

For the moment HDFS is only available on eqiad, putting us at risk of losing data. This task is about documenting priorities of datasets to be backed up. With that we should be able to better plan on how much we'll need depending on what we wish to save.

Related Objects
Search...

		Status	Subtype	Assigned	Task
		Open		None	T277015 Evaluate possible solutions to backup Analytics Hadoop's HDFS data
		Open		None	T283261 Define priorities for HDFS data to be backed up

Event Timeline

JAllemandou created this task.May 20 2021, 5:08 PM

My understanding based on recent conversations was that a decision is yet to be made about what approach to take with HDFS backup / redundancy. Is that still the case and is this a discovery task or do we have an answer and this is an implementation task? I'm trying to figure out what is the expectation of involvement from our side.

@LSobanski You're absolutely right, this task is about documenting on our end the priorities and sizes of datasets to be backed up so that we can be better inform next steps (including potential implementation) later.

@JAllemandou I'd remove Data Persistence from this task, with the team's choices we'll not get any storage for the next fiscal year (so this task may be confusing for other teams).

ACk - doing so - thanks @elukey

JAllemandou edited projects, added Analytics; removed Analytics-Clusters, Data-Persistence-Backup.May 20 2021, 5:20 PM

odimitrijevic assigned this task to JAllemandou.May 20 2021, 5:23 PM

odimitrijevic raised the priority of this task from Medium to High.

odimitrijevic moved this task from Incoming to Security Maturity and Data Privacy on the Analytics board.

LSobanski mentioned this in T277015: Evaluate possible solutions to backup Analytics Hadoop's HDFS data.May 20 2021, 5:25 PM

odimitrijevic added a project: Data-Engineering.Nov 18 2021, 6:03 AM

We have a template spreedsheet of some SRE-maintained datasets so we can keep track of its current properties and state. Would that be a useful tool for you to classify your maintained datasets?

odimitrijevic moved this task from Incoming (new tickets) to Security & Governance on the Data-Engineering board.Nov 22 2021, 5:20 PM

odimitrijevic removed a project: Analytics.Jan 11 2022, 11:22 PM

JArguello-WMF moved this task from Security & Governance to Incoming (new tickets) on the Data-Engineering board.Jun 29 2023, 11:44 PM

JArguello-WMF moved this task from Incoming (new tickets) to Radar (External Teams) on the Data-Engineering board.Jun 29 2023, 11:51 PM

BTullis added a project: Data-Platform-SRE.Jul 14 2023, 11:32 PM

JAllemandou removed JAllemandou as the assignee of this task.Sep 19 2023, 12:10 PM

lbowmaker moved this task from Radar (External Teams) to Icebox (not considered in current quarter) on the Data-Engineering board.Nov 10 2023, 1:23 PM

Gehel moved this task from Incoming to Toil / Automation on the Data-Platform-SRE board.Dec 7 2023, 1:52 PM