User Story
As a data engineer, I want to begin consolidating our ETL jobs into Airflow, so that I can get faster at deploying, maintaining and optimising our Data Pipelines
Scope
We currently have 100 workflows that are triggered and managed by Oozie. These workflows involve orchestrating multiple different kinds of in-order steps including:
- hql scripts running in Hive
- spark scripts
- hdfs calls
- Conditional checks
Goal:
The goal is to transition those jobs to be managed by Airflow.
Next Steps:
- Identify Low Risk, Low Complexity Jobs for new team members to begin migrating.
- Create a Migration Plan for the more complex and higher risk jobs
- Identify Current Oozie Jobs that could be Redesigned/Refactored when moved to Airflow
Success Criteria
- Have all our Oozie jobs moved into our airflow instance.
- Using Oozie is no longer required to schedule Data jobs
Open questions / remarks
- Do we have all the required operators?
- Who needs to validate that the pipeline is working as intended?