After successfully delivering Spark based column-level data lineage for Data Engineering, we want to provide data lineage to all other data platform Airflow instances.
https://wikitech.wikimedia.org/wiki/Data_Platform/Systems/Airflow/Instances
Stakeholders
- Search - @dcausse
- Research - @fab
- Platform Eng - @Cparle
- Analytics Product - @mpopov
- WMDE - @AndrewTavis_WMDE
- ML - @isarantopoulos
- Structured Data - @Cparle (?)
Implementation Steps
- list out all Airflow instances and stakeholders
- coordinate with stakeholders (Slack)
- for each instance list out DAGs and mark suitable ones
- submit data lineage integration patches for suitable DAGs
- roll out changes and monitor