At the moment, Product Analytics has about 10 active ETL jobs with widely differing setups (Oozie, shared systemd timer, individual cron jobs, etc.). We have decided to standardize on using Puppet-managed systemd timers, with the code stored in the analytics/wmf-product/jobs repo. This is the approach used by the movement_metrics job.
There are two main phases here (which will get subtasks soon):
- Standardize the process. We will want to use shared code where possible (e.g. a shared shell script, a Puppet 'class' that streamlines the process of creating a systemd timer) and write good documentation.
- Migrate all existing jobs. Once we have a standard process, we will need to change all the existing jobs (e.g. wikipediapreview_stats, Morten's cron jobs for Growth) to use it.