Page MenuHomePhabricator

Configure airflow webserver under Kubernetes to use OIDC authentication
Closed, ResolvedPublic

Description

As part of our project to T362788: Migrate Airflow to the dse-k8s cluster we would like to be able to improve the security model for Airflow, such that is uses the CAS/SSO system as a login authentication source.

Specifically, we would like to use the OIDC protocol.

Given that the Airflow webserver is a flask application, built with FAB (flask application builder) the process of configuring OIDC should be quite similar to that we undertook for Superset in T353794: Configure OIDC Authentication for Superset on K8S.

Event Timeline

Gehel triaged this task as High priority.Jul 9 2024, 8:03 AM
Gehel moved this task from Incoming to Scratch on the Data-Platform-SRE board.

According to the Airflow: High-Availability Strategy, The proposed solution is to make a public URL for each instance, for example: https://airflow-analytics.wikimedia.org https://airflow-research.wikimedia.org
This will require the idp entries for all instances we have that is available https://wikitech.wikimedia.org/wiki/Data_Platform/Systems/Airflow/Instances urls pending confirmation.

airflow-analytics
airflow-search
airflow-platform-eng
airflow-research
airflow-analytics-product
airflow-wmde

Written up on T371208

Since the test instances are the first to be deployed we shall start with airflow-analytics-test domains and OIDC setup

Since the test instances are the first to be deployed we shall start with airflow-analytics-test domains and OIDC setup

@Stevemunene @bking @brouberol - I think we decided to create a new Airflow instance called test-k8s to support this testing phase, didn't we?
See the Airflow on Kubernetes - Test and migration plan doc for details.

I'm fine to leave the airflow-analytics-testDNS records in place because we will need them in due course, but I think we will need additional configs for this new test-k8s instance.

Change #1062048 had a related patch set uploaded (by Stevemunene; author: Stevemunene):

[operations/dns@master] dns: provision airflow-test-k8s temp domain

https://gerrit.wikimedia.org/r/1062048

Change #1062048 merged by Stevemunene:

[operations/dns@master] dns: provision airflow-test-k8s temp domain

https://gerrit.wikimedia.org/r/1062048

Change #1063195 had a related patch set uploaded (by Bking; author: Bking):

[operations/deployment-charts@master] WIP: airflow: implement SSO auth

https://gerrit.wikimedia.org/r/1063195

Change #1063848 had a related patch set uploaded (by Stevemunene; author: Stevemunene):

[operations/puppet@production] trafficserver: add airflow-test-k8s discovery record

https://gerrit.wikimedia.org/r/1063848

Change #1063848 merged by Stevemunene:

[operations/puppet@production] trafficserver: add airflow-test-k8s discovery record

https://gerrit.wikimedia.org/r/1063848

Change #1069166 had a related patch set uploaded (by Stevemunene; author: Stevemunene):

[operations/deployment-charts@master] Update airflow-test-k8s image to include authlib

https://gerrit.wikimedia.org/r/1069166

Change #1069166 merged by jenkins-bot:

[operations/deployment-charts@master] Update airflow-test-k8s image to include authlib

https://gerrit.wikimedia.org/r/1069166

We have setup oidc config for testing on https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/1063195 which follows the standard flask procedure with FAB as used in superset as well.
The OIDC redirect is working, however we do have an error when trying to authenticate with CAS "Application Not Authorized to Use CAS" as below;

image.png (728×3 px, 158 KB)

Upon digging this is due to how we defined the redirect url ie.

/var/log/cas# tail -f cas.log | grep "ERROR"
2024-09-04 08:49:27,407 ERROR [org.apereo.cas.support.oauth.util.OAuth20Utils] - <Unsupported [redirect_uri]: [https://airflow-test-k8s.wikimedia.org/oauth-authorized/CAS] does not match what is defined for registered service: [https://airflow-test-k8s\.wikimedia\.org\/[\w\/]*]. Service is considered unauthorized. Verify the service matching strategy used in the service definition is correct and does in fact match the client [https://airflow-test-k8s.wikimedia.org/oauth-authorized/CAS]>
2024-09-04 08:49:27,412 ERROR [org.apereo.cas.support.oauth.util.OAuth20Utils] - <Unsupported [redirect_uri]: [https://airflow-test-k8s.wikimedia.org/oauth-authorized/CAS] does not match what is defined for registered service: [https://airflow-test-k8s\.wikimedia\.org\/[\w\/]*]. Service is considered unauthorized. Verify the service matching strategy used in the service definition is correct and does in fact match the client [https://airflow-test-k8s.wikimedia.org/oauth-authorized/CAS]>

Implementing a fix for this

Change #1063195 merged by jenkins-bot:

[operations/deployment-charts@master] airflow: implement SSO auth

https://gerrit.wikimedia.org/r/1063195

We have setup SSO authentication via CAS for the airflow webserver on our initial airflow-test-k8s instance. We have some primary role for new users and an admin group as well. This will act as a solid foundation for the other airflow instances we shall deploy.