Page MenuHomePhabricator

Upgrade Airflow to 2.9.2
Open, HighPublic

Description

We want to deploy the last version of Airflow (2.9.1) to (at least) the analytics instance. Because:

  • This version can display meaningful names for each dynamic task instance generated in the UI. Currently, it shows only an integer, the index of the dynamic task instance. e.g., for the Refine source sensor, we would get from 0 to sense_ mediawiki_reading_depth
  • This version allows depth-first execution within a dynamic dag. Instead of waiting for all sensors to pass before beginning to launch all the ETL jobs, we can queue the ETL job as soon as its matching sensor instance is successful.

Rough process:

  • Prepare a deb package with airflow-dag Gitlab CI
  • Test on test-cluster on analytics_text instance (with airflow db migrate)
  • Deploy and monitor analytics instance
  • Progressively deploy on other instances

Details

TitleReferenceAuthorSource BranchDest Branch
Upgrade airflow to version 2.9.2repos/data-engineering/airflow-dags!727stevemuneneairflow_version_2_9_1main
Customize query in GitLab

Event Timeline

@Antoine_Quhen: Assuming this task is about Data-Engineering, hence adding that project tag so other people can also find this task when searching via projects or looking at workboards. Please set appropriate project tags when possible. Thanks!

Some package conflicts during the build process, detailed below

The conflict is caused by:
    The user requested fsspec==2022.11.0
    apache-airflow 2.9.1 depends on fsspec>=2023.10.0

I shall be updating the version to the specified one and watching out for any more conflicts or broken parts

Manual changes to packages are the mentioned workflow_utils and conda-pack and the autogenerated python-graphviz==0.20.3 to follow what we use as graphviz==0.20.3

Running the check from the airflow-dags root folder with ./check_conda_environment_lock_yml.sh got a success message after the changes mentioned above

+ set +x
An environment could be created form conda-environment.lock.yml

Getting multiple failures on the tests on the skein submit hook

Manual changes to packages are the mentioned workflow_utils and conda-pack and the autogenerated python-graphviz==0.20.3 to follow what we use as graphviz==0.20.3

Running the check from the airflow-dags root folder with ./check_conda_environment_lock_yml.sh got a success message after the changes mentioned above

+ set +x
An environment could be created form conda-environment.lock.yml

Getting multiple failures on the tests on the skein submit hook

The python tests were fixed with this commit, hugethanks to @Antoine_Quhen https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/727/diffs?commit_id=37183a9ea3ac5d68b93ebf2fe31f64988e6116ac

Gehel renamed this task from Upgrade Airflow to 2.9.1 to Upgrade Airflow to 2.9.2.Tue, Jun 18, 3:13 PM

Added rules to set airflow upgrade branches following the convention airflow_version_* as protected. but the build job still seems to be stuck for the 2.9.2 version build.

image.png (226×652 px, 23 KB)

image.png (226×1 px, 38 KB)

From my local all tests passed and should be ready to go for test deployment

+ set +x
An environment could be created form conda-environment.lock.yml
(airflow-dags) ➜  airflow-dags git:(airflow_version_2_9_1) ✗ flake8                                       
(airflow-dags) ➜  airflow-dags git:(airflow_version_2_9_1) ✗ mypy
Success: no issues found in 261 source files
(airflow-dags) ➜  airflow-dags git:(airflow_version_2_9_1) ✗ black --check .

All done! ✨ 🍰 ✨
263 files would be left unchanged.
(airflow-dags) ➜  airflow-dags git:(airflow_version_2_9_1) ✗ isort --check .
Skipped 42 files
(airflow-dags) ➜  airflow-dags git:(airflow_version_2_9_1) ✗ pytest                        
[2024-06-20T15:29:27.208+0300] {db.py:1649} INFO - Dropping tables that exist
[2024-06-20T15:29:30.661+0300] {migration.py:216} INFO - Context impl SQLiteImpl.
[2024-06-20T15:29:30.661+0300] {migration.py:219} INFO - Will assume non-transactional DDL.
[2024-06-20T15:29:30.663+0300] {migration.py:216} INFO - Context impl SQLiteImpl.
[2024-06-20T15:29:30.663+0300] {migration.py:219} INFO - Will assume non-transactional DDL.
INFO  [alembic.runtime.migration] Context impl SQLiteImpl.
INFO  [alembic.runtime.migration] Will assume non-transactional DDL.
INFO  [alembic.runtime.migration] Running stamp_revision  -> 1949afb29106
WARNI [airflow.models.crypto] empty cryptography key - values will not be stored encrypted.
=================================================================================================================================================================== test session starts ===================================================================================================================================================================
platform darwin -- Python 3.10.14, pytest-7.3.2, pluggy-1.0.0
rootdir: /Users/smunene/airflow-dags
configfile: pyproject.toml
testpaths: tests, wmf_airflow_common
plugins: mock-3.10.0, anyio-4.4.0, time-machine-2.14.1, cov-3.0.0
collected 1455 items                                                                                                                                                                                                                                                                                                                                      

.
.
.
.


============================================================================================================================================================ 1443 passed, 12 skipped in 23.41s ============================================================================================================================================================