Page MenuHomePhabricator

Migrate the airflow-analytics-test scheduler to Kubernetes
Closed, ResolvedPublic

Description

Now that the webserver is running in Kubernetes, we'll migrate the kerberos and scheduler components to Kubernetes, to see whether everything still works as expected.

  • Create the s3 user for the airflow logs
  • Create the s3 bucket for the airflow logs
  • Store the S3 keys in the private repo
  • Stop the kerberos and scheduler systemd services on an-test-client1002
  • Copy all airflow logs to the s3 log bucket
  • Deploy airflow-analytics-test with kerberos, scheduler and remote logging enabled
  • Disable the airflow systemd services on the airflow host

Event Timeline

brouberol@cephosd1001:~$ sudo radosgw-admin user create --uid=airflow-analytics-test --display-name="airflow-analytics-test"
{
    "user_id": "airflow-analytics-test",
    "display_name": "airflow-analytics-test",
    "email": "",
    "suspended": 0,
    "max_buckets": 1000,
    "subusers": [],
    "keys": [
        {
            "user": "airflow-analytics-test",
            "access_key": "REDACTED",
            "secret_key": "REDACTED"
        }
    ],
    "swift_keys": [],
    "caps": [],
    "op_mask": "read, write, delete",
    "default_placement": "",
    "default_storage_class": "",
    "placement_tags": [],
    "bucket_quota": {
        "enabled": false,
        "check_on_raw": false,
        "max_size": -1,
        "max_size_kb": 0,
        "max_objects": -1
    },
    "user_quota": {
        "enabled": false,
        "check_on_raw": false,
        "max_size": -1,
        "max_size_kb": 0,
        "max_objects": -1
    },
    "temp_url_keys": [],
    "type": "rgw",
    "mfa_ids": []
}
brouberol@stat1008:~$ s3cmd --access_key=$access_key --secret_key=$secret_key --host=rgw.eqiad.dpe.anycast.wmnet --region=dpe --host-bucket=no mb s3://logs.airflow-analytics-test.dse-k8s-eqiad
Bucket 's3://logs.airflow-analytics-test.dse-k8s-eqiad/' created
brouberol changed the task status from Open to In Progress.Nov 19 2024, 4:01 PM

Change #1093177 had a related patch set uploaded (by Brouberol; author: Brouberol):

[operations/deployment-charts@master] airflow-analytics-test: deploy the scheduler and kerberos components

https://gerrit.wikimedia.org/r/1093177

brouberol@an-test-client1002:/srv/airflow-analytics_test/logs$ s3cmd \
    --access_key=$access_key \
    --secret_key=$secret_key \
    --host=rgw.eqiad.dpe.anycast.wmnet \
    --region=dpe \
    --host-bucket=no \
    sync -r ./* s3://logs.airflow-analytics-test.dse-k8s-eqiad/
...
Done. Uploaded 3914855088 bytes in 886.2 seconds, 4.21 MB/s.
brouberol@an-test-client1002 $ sudo puppet agent --disable "brouberol: WIP migrating the airflow scheduler to Kubernetes"
brouberol@an-test-client1002 $ sudo systemctl stop airflow-{webserver,kerberos,scheduler}@analytics_test.service

Change #1093364 had a related patch set uploaded (by Brouberol; author: Brouberol):

[operations/deployment-charts@master] Airflow: add missing hive connections

https://gerrit.wikimedia.org/r/1093364

Change #1093365 had a related patch set uploaded (by Brouberol; author: Brouberol):

[operations/deployment-charts@master] airflow: upgrade base image

https://gerrit.wikimedia.org/r/1093365

Change #1093366 had a related patch set uploaded (by Brouberol; author: Brouberol):

[operations/deployment-charts@master] airflow: allow multiple DAG folders to be pulled in

https://gerrit.wikimedia.org/r/1093366

Change #1093368 had a related patch set uploaded (by Brouberol; author: Brouberol):

[operations/puppet@production] an-test-client1002: ensure that airflow services are absent

https://gerrit.wikimedia.org/r/1093368

Change #1093364 merged by jenkins-bot:

[operations/deployment-charts@master] Airflow: add missing hive connections

https://gerrit.wikimedia.org/r/1093364

Change #1093365 merged by jenkins-bot:

[operations/deployment-charts@master] airflow: upgrade base image

https://gerrit.wikimedia.org/r/1093365

Change #1093368 merged by Brouberol:

[operations/puppet@production] an-test-client1002: ensure that airflow services are absent

https://gerrit.wikimedia.org/r/1093368

Change #1093373 had a related patch set uploaded (by Brouberol; author: Brouberol):

[operations/puppet@production] an-test-client1002: disable puppet management of airflow services

https://gerrit.wikimedia.org/r/1093373

Change #1093373 merged by Brouberol:

[operations/puppet@production] an-test-client1002: disable puppet management of airflow services

https://gerrit.wikimedia.org/r/1093373

Change #1093378 had a related patch set uploaded (by Brouberol; author: Brouberol):

[operations/puppet@production] hotfix: prevent puppet resource creation when no airflow instances are specified

https://gerrit.wikimedia.org/r/1093378

Change #1093378 merged by Brouberol:

[operations/puppet@production] hotfix: prevent puppet resource creation when no airflow instances are specified

https://gerrit.wikimedia.org/r/1093378

brouberol updated the task description. (Show Details)

We still have some unmerged patches

Change #1094436 had a related patch set uploaded (by Brouberol; author: Brouberol):

[operations/deployment-charts@master] airflow-analytics-test: use the cloudnative PG cluster

https://gerrit.wikimedia.org/r/1094436

Change #1093366 merged by jenkins-bot:

[operations/deployment-charts@master] airflow: allow multiple DAG folders to be pulled in

https://gerrit.wikimedia.org/r/1093366

Change #1093177 merged by jenkins-bot:

[operations/deployment-charts@master] airflow-analytics-test: deploy the scheduler and kerberos components

https://gerrit.wikimedia.org/r/1093177

Change #1097323 had a related patch set uploaded (by Brouberol; author: Brouberol):

[operations/deployment-charts@master] airflow-analytics-test: add namespace to the cloudnativePG tenant namespaces

https://gerrit.wikimedia.org/r/1097323

Change #1097323 merged by Brouberol:

[operations/deployment-charts@master] airflow-analytics-test: add namespace to the cloudnativePG tenant namespaces

https://gerrit.wikimedia.org/r/1097323

Change #1094436 merged by Brouberol:

[operations/deployment-charts@master] airflow-analytics-test: use the cloudnative PG cluster

https://gerrit.wikimedia.org/r/1094436