Page MenuHomePhabricator

[Airflow] Refactor jobs to not use DAG factories
Closed, ResolvedPublic

Description

This task was at the beginning for unifying 2 DAG factories that were doing very similar things.
However, during the unification, we decided that it was better to not use both of those DAG factories at all.
And fall back to using Sensors and Operators directly in the DAG file.

The reason being the DAG factories in question were encapsulating a small number of tasks,
and we still needed to pass most parameters of those tasks directly to the DAG factory,
which did not reduce the lines of code in the DAG drastically.
Also, the readability of the DAG is increased when using Sensors and Operators directly.

Event Timeline

EChetty renamed this task from Unifying HDFS Sensors to Investigate unifying SparkSQLRunner DAG templates .Feb 23 2022, 2:13 PM
EChetty updated the task description. (Show Details)
mforns renamed this task from Investigate unifying SparkSQLRunner DAG templates to [Airflow] Refactor jobs to not use DAG factories.Mar 24 2022, 2:29 PM
mforns updated the task description. (Show Details)

The related merge request:
https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/38

All DAGS have been refactored and tested.

This change also includes the DynamicConfig module, which makes it easier to test the DAGs in the dev environment.

I'm still working on the DAG unit tests.

Don't the anomoly detection dags still use a dag factory?

Yes, they do. I didn't change them, because in this case the use of a factory is justified.
However the factory should not be a DAG factory, but rather a TaskGroup factory.
Also, the sensors should not be part of that TaskGroup factory.
We should do that.