Page MenuHomePhabricator

Create a tool checking HDFS data size
Open, LowPublic

Description

Such a tool would allow us to get reports on unexpected data size (missing data, too much data)for our datasets.

The tool should:

  • check multiple folders at once (to have hourly run, daily run etc)
  • Be configurable (yaml?)
  • send detailed email with any folder failure found
  • Have return code based on result (0 if no error, 1 if any error)

Event Timeline

Ottomata moved this task from Incoming to Data Quality on the Analytics board.
odimitrijevic lowered the priority of this task from High to Low.Jan 6 2022, 4:05 AM

Hi @JAllemandou could you also share why this tool is needed, any use case might help understand better? Also "Create a tool checking for data presence based on file-size" what data you are referring here?

JAllemandou renamed this task from Create a tool checking for data presence based on file-size to Create a tool checking HDFS data size.Jul 10 2023, 11:15 AM
JAllemandou updated the task description. (Show Details)

Hi @Gopavasanth, I updated the task description and title.
Let me know if you wish more details!