We want to deploy the last version of Airflow (2.9.1) to (at least) the analytics instance. Because:
- This version can display meaningful names for each dynamic task instance generated in the UI. Currently, it shows only an integer, the index of the dynamic task instance. e.g., for the Refine source sensor, we would get from 0 to sense_ mediawiki_reading_depth
- This version allows depth-first execution within a dynamic dag. Instead of waiting for all sensors to pass before beginning to launch all the ETL jobs, we can queue the ETL job as soon as its matching sensor instance is successful.
Rough process:
- Prepare a deb package with airflow-dag Gitlab CI
- Test on test-cluster on analytics_text instance (with airflow db migrate)
- Deploy and monitor analytics instance
- Progressively deploy on other instances