Data Platform has changed the data pipelines scaffolding.
Before | Now |
both the Spark jobs and the Airflow DAG were living in the same repository | separated: Spark jobs are in a standalone repo, while DAGs live here |
a mix of build tools handled packaging and deployment of both Spark jobs and the Airflow DAG | GitLab CI and conda distribution are used to package, release. and deploy Spark jobs |
Use the example job to set up the section topics data pipeline repo.