The new Wikidata Platform team will need:
1. A dedicated Airflow scheduler instance.
2. The corresponding Kubernetes resources and deployment.
3. A new Airflow DAGs monorepo setup for their pipelines.
Could you help set this up (or advise on the process/owners) so the team can start taking ownership of data pipelines currently
deployed on the Search Platform instance?
The setup for this as outlined in https://wikitech.wikimedia.org/wiki/Data_Platform/Systems/Airflow/Kubernetes/Operations#Creating_a_new_instance
[x] Create Kubernetes read and deploy user credentials
[x] Add a namespace
[x] create the public and internal DNS records `airflow-wikidata.wikimedia.org`
[] Define the PG cluster and airflow instance helmfile.yaml files and associated values (in Review)
[x] Generate the S3 keypairs for both PG and Airflow
[x] Create the S3 buckets for both PG and Airflow
[x] Register the service in our IDP serve
[x] Issue a Kerberos keytab
[x] Generate the secrets or both the PG cluster and the Airflow instance
[x] Register the PG bucket name and keys
[] Create the ops group for the instance
[] Create the dags folder and a sample Dag
[] Create UNIX user/group `analytics-wikidata` and the corresponding `analytics-wikidata-users`
[] Create the HDFS folders
[] Configuring out-of-band backups