Primary Task
Create a Superset dashboard
& jupyter notebookto extract information from the datasets
The dashboard/notebook should clearly:
- Identify inefficient file storage (large number of small files)
- Get trends of the footprint of our dataset
-
Monitor cluster capacity, corruption