Page MenuHomePhabricator

Upgrade Puppet code to make Airflow configuration files compatible with version 2.5.0
Closed, ResolvedPublic5 Estimated Story Points

Description

The goal is:

  • and fix breaking changes
  • to avoid deprecation warnings.
  • add datahub kafka test connection config to puppet

Notice: When switching between versions, we need to run airflow db upgrade.

Event Timeline

Change 827526 had a related patch set uploaded (by Snwachukwu; author: Snwachukwu):

[operations/puppet@production] Update Puppet files for Airflow Upgrade to 2.3.2

https://gerrit.wikimedia.org/r/827526

We should make sure the last version of the airflow deb is not shipping this version of zlib: zlib 1.2.12 h7f8727e_1 The ..._1 upload is no more on the conda forge.

Currently, when cloning env, we get:
CondaHTTPError: HTTP 404 NOT FOUND for url <https://repo.anaconda.com/pkgs/main/linux-64/zlib-1.2.12-h7f8727e_1.conda>

here: https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/blob/main/run_dev_instance.sh#L243

Tmp fix is to copy the missing file:

mkdir -p /tmp/aqu2/.conda/envs
cp -R ~/.conda/envs/airflow_development /tmp/aqu2/.conda/envs/
chmod -R 777 /tmp/aqu2
sudo -u analytics-privatedata ./run_dev_instance.sh -m /tmp/aqu2 analytics-test

Hey all - Just a note from the Security-Team and @MoritzMuehlenhoff - Airflow should be bumped to 2.3.4 so as to avoid introducing https://nvd.nist.gov/vuln/detail/CVE-2022-38054.

Hey all - Just a note from the Security-Team and @MoritzMuehlenhoff - Airflow should be bumped to 2.3.4 so as to avoid introducing https://nvd.nist.gov/vuln/detail/CVE-2022-38054.

In addition 2.3.4 will also address CVE-2022-38170: https://www.openwall.com/lists/oss-security/2022/09/02/3

EChetty set the point value for this task to 5.Sep 6 2022, 10:08 AM
BTullis renamed this task from Puppet change for Airflow Upgrade to Upgrade Airflow configuration file in puppet to be compatible with version 2.3.4.Dec 7 2022, 11:30 AM
BTullis updated the task description. (Show Details)

Change 867668 had a related patch set uploaded (by Aqu; author: Aqu):

[operations/puppet@production] Use Airflow 2.4.3 + Postgres in test-cluster

https://gerrit.wikimedia.org/r/867668

Antoine_Quhen renamed this task from Upgrade Airflow configuration file in puppet to be compatible with version 2.3.4 to Upgrade Puppet code to make Airflow configuration files compatible with version 2.3.4.Jan 4 2023, 2:38 PM
Antoine_Quhen updated the task description. (Show Details)

@Stevemunene This should be doable with minimal puppet code changes, I believe only hiera data changes are needed.

Change 878128 had a related patch set uploaded (by Stevemunene; author: Stevemunene):

[operations/puppet@production] Update analytics_text conf compatibility with airflow2.3.4 connect postgresql

https://gerrit.wikimedia.org/r/878128

@Ottomata some changes were needed on the $airflow_config Hash to remove deprecated sql_alchemy_conn from [core] now in [database] Ref .This is also reflected in hieradata

Here are the last modifications to add to the airflow configuration in the puppet code.

Configuration changes to airflow.cfg:

# Rename dag_concurrency to max_active_tasks_per_dag
# And remove sql_alchemy_conn + load_default_connections
[core]
# sql_alchemy_conn = mysql://airflow_data_engineering_dev:password@an-db1001.eqiad.wmnet:5432/airflow_data_engineering_dev
# load_default_connections = False
# dag_concurrency = 6
max_active_tasks_per_dag = 6

# Move 2 parameters [database] from [core]
[database]
sql_alchemy_conn = postgresql://airflow_data_engineering_dev:password@an-db1001.eqiad.wmnet:5432/airflow_data_engineering_dev
load_default_connections = False

# Rename auth_backend to auth_backends with an `s`
[api]
#auth_backend = airflow.api.auth.backend.default
auth_backends = airflow.api.auth.backend.default

# New block to add
[datahub]
enabled = False
conn_id = datahub_kafka_test
cluster = test

Configuration changes in connections.yml:

analytics-test-hive:
  conn_type: hive_metastore
  host: analytics-test-hive.eqiad.wmnet
  port: 9083
  extra_dejson:
    # Rename authMechanism to auth_mechanism
    auth_mechanism: GSSAPI

# Add the following connection
datahub_kafka_test:
  conn_type: datahub_kafka
  host: kafka-test1006.eqiad.wmnet:9092

Change 887735 had a related patch set uploaded (by Btullis; author: Btullis):

[labs/private@master] Add some dummy tokens for the airflow_test database

https://gerrit.wikimedia.org/r/887735

Change 887735 merged by Btullis:

[labs/private@master] Add some dummy tokens for the airflow_test database

https://gerrit.wikimedia.org/r/887735

Here are the last modifications to add to the airflow configuration in the puppet code.

Configuration changes to airflow.cfg:

# Rename dag_concurrency to max_active_tasks_per_dag
# And remove sql_alchemy_conn + load_default_connections
[core]
# sql_alchemy_conn = mysql://airflow_data_engineering_dev:jah~Cae9ohzohjie@an-db1001.eqiad.wmnet:5432/airflow_data_engineering_dev
# load_default_connections = False
# dag_concurrency = 6
max_active_tasks_per_dag = 6

# Move 2 parameters [database] from [core]
[database]
sql_alchemy_conn = postgresql://airflow_data_engineering_dev:jah~Cae9ohzohjie@an-db1001.eqiad.wmnet:5432/airflow_data_engineering_dev
load_default_connections = False

# Rename auth_backend to auth_backends with an `s`
[api]
#auth_backend = airflow.api.auth.backend.default
auth_backends = airflow.api.auth.backend.default

# New block to add
[datahub]
enabled = False
conn_id = datahub_kafka_test
cluster = test

Configuration changes in connections.yml:

analytics-test-hive:
  conn_type: hive_metastore
  host: analytics-test-hive.eqiad.wmnet
  port: 9083
  extra_dejson:
    # Rename authMechanism to auth_mechanism
    auth_mechanism: GSSAPI

# Add the following connection
datahub_kafka_test:
  conn_type: datahub_kafka
  host: kafka-test1006.eqiad.wmnet:9092

We were able to recreate the airflow.cfg and connections.yml as described in the comment above. Enlisted help on the password/secrets management for psql connection and were able to solve it. Working on refactoring the puppet code to avail this in a cleaner manner as per review.

BTullis renamed this task from Upgrade Puppet code to make Airflow configuration files compatible with version 2.3.4 to Upgrade Puppet code to make Airflow configuration files compatible with version 2.5.0.Feb 15 2023, 2:05 PM

Puppet code updated to provide airflow version compatible config based on provided airflow version. This shall be updated once all instances are on the same upgraded airflow version.

Change 867668 abandoned by Ottomata:

[operations/puppet@production] Use Airflow 2.4.3 + Postgres in test-cluster

Reason:

Work being done in https://gerrit.wikimedia.org/r/c/operations/puppet/+/878128/36..48

https://gerrit.wikimedia.org/r/867668

Change 878128 merged by Nicolas Fraison:

[operations/puppet@production] Update airflow conf compatibility with airflow 2.5.0 connect postgresql

https://gerrit.wikimedia.org/r/878128

Change 827526 abandoned by Ottomata:

[operations/puppet@production] Update Puppet files for Airflow Upgrade to 2.3.2

Reason:

https://gerrit.wikimedia.org/r/827526