The Analytics team would like to have a dedicated Airflow instance to start experimenting with it, like the Discovery team has been doing so far. The goal would be to start playing with Airflow with different workloads to see requirements, things missing, etc.. not to have a final RFC about how multiple teams should use Airflow. That will come in a later step :)
It would be nice to re-use all the work done, generalizing it a little bit to avoid the Discovery/Search specific bits. This assumes that Airflow doesn't really handle multi-tenancy, especially in the context of various kerberos credentials (for example, running jobs as analytics-search vs analytics).
Overall steps:
- Create a new VM called an-airflow1002 in Ganeti (specs to be decided, but probably something close to an-airflow1001 as starter it is ok).
- Generalize the gerrit search/airflow repository. It should contain only airflow-related things, but we may want to have a something under the analytics/airflow namespace as well. The main thing to figure out in my opinion would be how to handle different versions of airflow in the same repo (master branch vs version-specific branches etc..). We can also think about keeping the two repositories split for the moment. The discovery team runs Airflow 1.10.6 but the Analytics team might want to jump directly to 2.0.0 (it is already available in Pypi).
- Generalize the puppet code to avoid Discovery/Search specific bits, but this shouldn't be too complicated.
- Think about common plugins to share. IIUC the Discovery team already started to create some for swift upload, etc.. and it would be nice to share as much as possible :) This step can be done later on but it would be good if we start thinking about it as early as possible.
The final goal for this task is to have an-airflow1002 running (with the analytics user).