Page MenuHomePhabricator

[SPIKE] Define process to build out lineage in DataHub
Closed, ResolvedPublic8 Estimated Story Points

Description

For FY24/25 we have the following hypothesis under SDS1.3:

If we define the process to transfer all data sets and pipeline configurations from the Data Platform to DataHub we can build tooling to get lineage documentation automatically

To do:

  • Define what data artifacts are missing in DataHub and what we might want to add
  • Define how we would want to feed lineage to DataHub

Notes:

  • For Airflow lineage discuss adding things like source tables to datasets.yaml?
  • How would lineage work for other systems?

Event Timeline

Ahoelzl claimed this task.
Ahoelzl updated the task description. (Show Details)