For FY24/25 we have the following hypothesis under SDS1.3:
If we define the process to transfer all data sets and pipeline configurations from the Data Platform to DataHub we can build tooling to get lineage documentation automatically
To do:
- Define what data artifacts are missing in DataHub and what we might want to add
- Define how we would want to feed lineage to DataHub
Notes:
- For Airflow lineage discuss adding things like source tables to datasets.yaml?
- How would lineage work for other systems?