We, Analytics and Platform Engineering, are tasked with building infrastructure to support data pipelines of the form: "crunch some data and publish the results". We chose AirFlow to do the scheduling part of this infrastructure, and we are going to keep in touch as we:
- catalog and categorize existing and potential future jobs
- build a minimal set of clean modular templates that can handle these jobs
- iterate implementing and deploying one job at a time, learning as we go
This can be a parent task to any related work, so we can keep in touch on progress and technical details.