Page MenuHomePhabricator

Upgrade platform_eng Airflow instance to 2.5.1
Closed, ResolvedPublic

Description

Airflow 2.5.1 has now been merged into the main branch of airflow-dags.

The faster we upgrade and get in sync with the rest of the teams, the easier this upgrade will be. We think this should take at ~1 hour, with the DAGs on pause while the database is upgraded.

We currently run Airflow 2.1.2. For Structured Data Engineering purposes, the immediate changes are mostly cosmetic. Later on, we should be able to leverage new Airflow features.

Event Timeline

Scheduled for Thursday March 16 @ 16:00 UTC.

Preemptively paused all DAGs just now.

Ok this has been done now.

We followed P43199 and P43200.

Follow up items after Airflow 2.5.1 upgrade on platform_eng:

  • Seems like we lost history for 2 DAGs. One dag does have all history. @Antoine_Quhen is this something recoverable?
  • We have some ‘dangling’ tables that we need to take care off. @BTullis to follow up, and remove?
  • We have an INFO stack trace on most sensors and operator logs that perhaps should be investigated:
[2023-03-16, 17:40:51 UTC] {logging_mixin.py:137} INFO - Exception: Traceback (most recent call last):
  File "/usr/lib/airflow/lib/python3.10/site-packages/datahub_provider/_plugin.py", line 281, in custom_on_success_callback
    datahub_task_status_callback(context, status=InstanceRunResult.SUCCESS)
  File "/usr/lib/airflow/lib/python3.10/site-packages/datahub_provider/_plugin.py", line 134, in datahub_task_status_callback
    .get_underlying_hook()
  File "/usr/lib/airflow/lib/python3.10/site-packages/datahub_provider/hooks/datahub.py", line 189, in get_underlying_hook
    conn = self.get_connection(self.datahub_conn_id)
  File "/usr/lib/airflow/lib/python3.10/site-packages/airflow/hooks/base.py", line 72, in get_connection
    conn = Connection.get_connection_from_secrets(conn_id)
  File "/usr/lib/airflow/lib/python3.10/site-packages/airflow/models/connection.py", line 435, in get_connection_from_secrets
    raise AirflowNotFoundException(f"The conn_id `{conn_id}` isn't defined")
airflow.exceptions.AirflowNotFoundException: The conn_id `datahub_rest_default` isn't defined

@Milimetric to follow up.

If we need tickets for these items, let me know folks.

xcollazo changed the task status from Open to In Progress.Mar 16 2023, 6:51 PM

No history was lost. Some dags have been renamed: https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/commit/760f31789ee20f3e6e263fa4733ff51202fa52a0

So new dags were created when we deployed the last version of airflow-dags. In other words, the migration was not the problem.

Yet, there is some work to reconcile both histories if needed.

No history was lost. Some dags have been renamed: https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/commit/760f31789ee20f3e6e263fa4733ff51202fa52a0

So new dags were created when we deployed the last version of airflow-dags. In other words, the migration was not the problem.

Yet, there is some work to reconcile both histories if needed.

Ah, yes, had forgot about those renames. Nevermind, no need to mess with the DB to recover those.

Just opened T332820 and T332822 to take care of the remaining issues as time allows.

Closing this one. Thank you all for the help!