Project administrators should be able to configure Prometheus scrape targets and alert rules for their project without making changes to operations/puppet. In the long term there are two optimal ways to achieve this this that I can see:
- Enable management via Hiera/Puppet
- Pro: Nice to deal with in a project that is otherwise managed with Puppet
- Con (?): Difficult to use - is Hiera easy enough for the target audience?
- Con: Difficult to get proper authentication done
- Con: how to deal with services that are not bound to a single VM - take Kubernetes pods for example
- Create a web UI/Horizon interface
- Pro: Ease of use
- Con: Harder to get something like "Scrape all Toolforge Redis hosts on port X" automated
- Con: Requires manual clicking for large projects managed with Puppet
- Con: either have to deal with developer account authentication on cloud realm or have a prod-cloud connection
Bonus points if the solution can automatically make sure the required security group rules are present.
My short-term plan is to create a tool that you can customize with per-project config files ("Scrape all Toolforge Redis hosts on port X", "Alerting rule Y is there") and that creates full configuration for Prometheus and Alertmanager. It's rather bare-bones, but it's better than the current static configuration and gives us a good foundation to continue development, for example to add a database and api to modify rules.