Page MenuHomePhabricator

Review the Airflow instance security settings to ensure that they are still suitable
Closed, ResolvedPublic

Description

Current situation

Each of our 7 Airflow instances is deployed on its own VM except for the analytics instance, which is deployed to a bare-metal host an-launcher1002.

The web server for the Airflow UI and the API is deployed in wuch a way that it listens on all addresses, but access is retricted to the $ANALYTICS_NETWORKS
See: https://github.com/wikimedia/operations-puppet/blob/production/modules/airflow/manifests/instance.pp#L111-L113

# [*ferm_srange*]
#   ferm srange on which to allow access to Airflow (really just the airflow-webserver port).
#   Default: $ANALYTICS_NETWORKS

This was originally configured here: https://github.com/wikimedia/operations-puppet/commit/cd1630676cde68eb4c050af3cd0270f4d9fe425e

This is convenient for some users who are members of analytics-privatedata-users and can therefore log onto stat servers, but who happen not to be members of the service_group that gives them SSH access to the airflow instance themselves.
These groups are:

  • analytics
  • analytics-search
  • analytics-wmde
  • analytics-research
  • analytics-platform-eng
  • analytics-product

They each correspond with their respective instances. The analytics group permits access to the analytics and analytics_test instances.

The webservers do not provide authentication or authorization of incoming requests, so anyone who has access to a stat server (or any other server in $ANALYTICS_NETWORKS) could theoretically execute any action on any DAG on any instance. For example: deleting a DAG, changing its configuration, running any DAG, pausing a DAG. We would not have any reliable record of who carried out the action.

Given that we have production pipelines in Airflow now, should we review the rights available to see if they are still appropriate?

n.b.

Airflow has a very capable security model built-in.

We should think about both short and long-term approaches to addressing this issue.

Personally, I would like to move towards Airflow running under Kubernetes and removing the individual VMs from the picture, but we might want a short term mitigation of this issue in the meantime.

Event Timeline

Gehel triaged this task as Medium priority.Feb 23 2024, 9:19 AM
Gehel moved this task from Incoming to Security on the Data-Platform-SRE board.
BTullis claimed this task.

I think that we can close this issue, as it will be superseded by the work on T374948: Migrate airflow webservers to Kubernetes and T364387: Adapt Airflow auth and DAG deployment method and T375716: Ensure the Airflow API can be reached out to from within Kubernetes and is authenticated
In summary, we won't be changing the network access to the Airflow webserver and API while they are still running on the virtual machines.
However, we will shortly be moving these services to Kubernetes and implementing authentication. There will be a public-facing URL for each instance for access to the webserver, but this will only be accessible with OIDC authentication from the CAS/SSO system. The API will only be available internally, but with use either basic or kerberos authentication (TBD).