Can Airflow substitute all of our various scheduling tools:
- reportupdater
- oozie
- spark refine
- some systemd timers
- and ONE more!
Can Airflow substitute all of our various scheduling tools:
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | odimitrijevic | T282033 Airflow collaborations | |||
Resolved | odimitrijevic | T271429 Replace Oozie with better workflow scheduler | |||
Resolved | None | T217059 Spike [2019-2020 work] Oozie Replacement. Airflow Study / Argo Study |
I just read a bunch of Airflow docs, and I'm not really sure how easy it will be to replace oozie. For all the others, it might be great! I haven't yet seen an ability to trigger runs based on dataset existence, but perhaps I'm just missing it.
I also just noted that is has (experimental?) 'lineage' support, which helps for keeping track of data lineage and governance, and has integration for use with Apache Atlas. This might be relevant to some Better use of Data use cases.
Ok, Joseph clued me into Airflow Sensors, which do indeed seem to do what we need.
https://github.com/apache/airflow/tree/master/airflow/sensors
FYI, RelEng is considering using Argo for CI in Kubernetes. Argo looks like it has some similarities with Airflow:
Since the search team is managing a trial airflow setup, perhaps we should use their setup for this spike? We could try to replicate some existing use cases in Airflow. Perhaps:
These are a bit different, but cover a lot of what we do with oozie and systemd timers. It'd be a good sign if we can make Airflow can do both well.
The Search's setup is very custom and not really re-usable IIUC, it would be really great to spent a bit of time trying to improve what it is currently in puppet and how Airflow is deployed (currently directly via scap in a Search gerrit repo, together with their code).
I like the idea of testing the above use cases, especially if we find a unified way to alarm. For example, the way that oozie notifies us about a problem is still an email, that is not great as we know, meanwhile timers leverage icinga.
The Search's setup is very custom and not really re-usable IIUC
I wouldn't want to actually replace our oozie&timer stuff, just try to do so and see if we can run things writing into scratch directories.
For example, the way that oozie notifies us about a problem is still an email, that is not great as we know, meanwhile timers leverage icinga.
I somehow doubt icinga will be the answer for us. Icinga doesn't allow for dynamic lasting alert statuses. In our current system, Hue is almost acting like Icinga for dataset generating jobs. We get an email from Oozie about a job failure, and then Hue shows us what has or hasn't failed. The systemd timer alerts in icinga only alert us on the most recent status of a job run.
But yeah, perhaps the Airflow UI will replace Oozie+Hue in a better way.
What would Icinga look like if the webrequest load job had failures for 6 hourly datasets spread over the last month? We'd want an 'alert' on each of these failures.
Every job instance is an individual thing we'd want alerting on.
Or we could use an aggregator, that would say "at least one job failed etc.." and then use the Airflow UI to detect the failures like we do with Hue (but not sure if possible or ideal).
Oh yeah that would be good too! I just mean we wouldn't want to rely only on only the aggregate alerts; Icinga won't work for us as the main alerting solution.