In today's Analytics Systems hangtime meeting, we talked with @fkaelin and @gmodena about work they want to do with Airflow. I told them that Airflow is basically ready for testing, and we could create instances for them now if they liked. They do like! They understand that we are still iterating and figuring it out for ourselves too. It will do us all good to be able to work out best practices together.
Let's create instances for them now. We haven't done this before, so we'll likely need to formalize this process. It will be something like:
- Create new Ganeti VMs: an-airflow1002 (research), an-airflow1003 (platform eng)
- Create new system users: analytics-research, analytics-platform-eng. Declare these system users in profile::analytics::cluster::users These users should also be added in admin data.yaml, but commented out until T231067 is complete (as other system users are). See analytics-search as an example.
- Create new user groups analytics-research-users and analytics-platform-eng-users and include relevant users in members and system user in system_members. Members in these groups should have sudo privileges to their system user. Also include the system users in the analytics-privatedata-users group. See analytics-search-users as an example. We'll also need to make sure users in these groups can manage airflow services. See airflow-search-admins for an example. Q: should these be -admins groups instead of -users groups? Perhaps if they need sudo privs they should.
- Create kerberos principals and keytabs for these system users @ their airflow VM hostname following https://wikitech.wikimedia.org/wiki/Analytics/Systems/Kerberos#Create_a_keytab_for_a_service
- Create the airflow instances on the VMs following https://wikitech.wikimedia.org/wiki/Analytics/Systems/Airflow#Creating_a_new_Airflow_Instance