Page MenuHomePhabricator

[Airflow] Gather dataset information from DataHub
Open, Needs TriagePublic

Description

Having dataset properties like granulariy, fully qualified table name, base path, etc. in a structured and unified way,
is very valuable in Airflow, since it helps defining data dependencies between DAGs more easily.
It would be very nice if we could gather those properties from DataHub at Airflow's start and at an interval (every hour?).
This way, we keep DataHub as a single source of truth, avoid duplication of information and reduce workload for DAG developers.