As part of "Migrate Airflow to the dse-k8s cluster" - T362788
We will migrate the webserver components first, then migrate the schedulers afterwards, once we have carried out more testing.
| Gehel | |
| May 7 2024, 1:15 PM |
| F57536834: Screenshot 2024-09-25 at 16.15.15.png | |
| Sep 25 2024, 2:39 PM |
| F57536835: Screenshot 2024-09-25 at 16.15.21.png | |
| Sep 25 2024, 2:39 PM |
As part of "Migrate Airflow to the dse-k8s cluster" - T362788
We will migrate the webserver components first, then migrate the schedulers afterwards, once we have carried out more testing.
| Title | Reference | Author | Source Branch | Dest Branch | |
|---|---|---|---|---|---|
| Don't propagate the container SPARK_HOME to the hadoop workers | repos/data-engineering/airflow-dags!929 | brouberol | T364389 | main | |
| Ensure SPARK_HOME=/usr/lib/spark3 | repos/data-engineering/airflow!43 | brouberol | T364389 | main | |
| Install missing libhdfs | repos/data-engineering/airflow!42 | brouberol | T364389 | main | |
| Install libsasl libraries (take 3) | repos/data-engineering/airflow!41 | brouberol | T364389 | main | |
| Install libsasl libraries (take 2) | repos/data-engineering/airflow!40 | brouberol | T364389 | main | |
| Install libsasl libraries | repos/data-engineering/airflow!39 | brouberol | T364389 | main | |
| Add missing dependencies | repos/data-engineering/airflow!38 | brouberol | T364389 | main | |
| Cleanup scheduler logs as part of the purge_old_logs_from_s3 DAG | repos/data-engineering/airflow-dags!928 | brouberol | cleanup-scheduler-logs | main |
Change #1075165 had a related patch set uploaded (by Brouberol; author: Brouberol):
[operations/deployment-charts@master] WIP: enable Kubernetes executor
As the Airflow scheduler will need to be able to create/get/list/delete/watch Pods as part of the task lifecycle, it will need to have the associated permissions through a dedicated ServiceAccount. However, experience has shown that we can't create Role or ClusterRole resources within a chart, as the deploy role cannot manage them. These resources must be defined in admin_ng.
Having talked to @JMeybohm, it appears that a middleground solution could be:
Change #1075508 had a related patch set uploaded (by Brouberol; author: Brouberol):
[operations/deployment-charts@master] Deploy an airflow-scheduler ClusteRole to dse-k8s-eqiad
After some testing, it appears that the deploy user cannot handle RoleBinding resources either, making the previously stated solution a dead end.
Change #1075508 merged by Brouberol:
[operations/deployment-charts@master] Specify a custom deploy clusterrole for airflow namespaces in dse
We have been able to trigger our first task to Kubernetes! Tailing of the task logs while it's still running is done via the Kube API, and the logs are then uploaded to s3.
Change #1075165 merged by Brouberol:
[operations/deployment-charts@master] Enable the usage of the Kubernetes executor
Work has been done to allow us to migrate from LocalExecutor to KubernetesExecutor. I'm going to send this task back to our backlog, as the actual migration won't happen before a bit.
brouberol opened https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/928
Cleanup scheduler logs as part of the purge_old_logs_from_s3 DAG
brouberol opened https://gitlab.wikimedia.org/repos/data-engineering/airflow/-/merge_requests/38
Add missing dependencies
brouberol merged https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/928
Cleanup scheduler logs as part of the purge_old_logs_from_s3 DAG
brouberol merged https://gitlab.wikimedia.org/repos/data-engineering/airflow/-/merge_requests/38
Add missing dependencies
brouberol opened https://gitlab.wikimedia.org/repos/data-engineering/airflow/-/merge_requests/39
Install libsasl libraries
brouberol merged https://gitlab.wikimedia.org/repos/data-engineering/airflow/-/merge_requests/39
Install libsasl libraries
brouberol opened https://gitlab.wikimedia.org/repos/data-engineering/airflow/-/merge_requests/40
Install libsasl libraries (take 2)
brouberol merged https://gitlab.wikimedia.org/repos/data-engineering/airflow/-/merge_requests/40
Install libsasl libraries (take 2)
brouberol opened https://gitlab.wikimedia.org/repos/data-engineering/airflow/-/merge_requests/41
Install libsasl libraries (take 3)
brouberol merged https://gitlab.wikimedia.org/repos/data-engineering/airflow/-/merge_requests/41
Install libsasl libraries (take 3)
brouberol opened https://gitlab.wikimedia.org/repos/data-engineering/airflow/-/merge_requests/42
Install missing libhdfs
brouberol merged https://gitlab.wikimedia.org/repos/data-engineering/airflow/-/merge_requests/42
Install missing libhdfs
brouberol opened https://gitlab.wikimedia.org/repos/data-engineering/airflow/-/merge_requests/43
Ensure SPARK_HOME=/usr/lib/spark3
brouberol closed https://gitlab.wikimedia.org/repos/data-engineering/airflow/-/merge_requests/43
Ensure SPARK_HOME=/usr/lib/spark3
brouberol opened https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/929
Don't propagate the container SPARK_HOME to the hadoop workers
brouberol merged https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/929
Don't propagate the container SPARK_HOME to the hadoop workers
Change #1097424 had a related patch set uploaded (by Brouberol; author: Brouberol):
[operations/puppet@production] global_config: open port 8600 (webserver) for airflow services
Change #1097424 merged by Brouberol:
[operations/puppet@production] global_config: open port 8600 (webserver) for airflow services
Change #1097430 had a related patch set uploaded (by Brouberol; author: Brouberol):
[operations/puppet@production] airflow: enable port 8600 to be reached from Kubernetes
Change #1097430 merged by Brouberol:
[operations/puppet@production] airflow: enable port 8600 to be reached from Kubernetes