Page MenuHomePhabricator

Enable self-service Prometheus configuration management for project administrators
Open, MediumPublic

Description

Project administrators should be able to configure Prometheus scrape targets and alert rules for their project without making changes to operations/puppet. In the long term there are two optimal ways to achieve this this that I can see:

  • Enable management via Hiera/Puppet
    • Pro: Nice to deal with in a project that is otherwise managed with Puppet
    • Con (?): Difficult to use - is Hiera easy enough for the target audience?
    • Con: Difficult to get proper authentication done
    • Con: how to deal with services that are not bound to a single VM - take Kubernetes pods for example
  • Create a web UI/Horizon interface
    • Pro: Ease of use
    • Con: Harder to get something like "Scrape all Toolforge Redis hosts on port X" automated
    • Con: Requires manual clicking for large projects managed with Puppet
    • Con: either have to deal with developer account authentication on cloud realm or have a prod-cloud connection

Bonus points if the solution can automatically make sure the required security group rules are present.

My short-term plan is to create a tool that you can customize with per-project config files ("Scrape all Toolforge Redis hosts on port X", "Alerting rule Y is there") and that creates full configuration for Prometheus and Alertmanager. It's rather bare-bones, but it's better than the current static configuration and gives us a good foundation to continue development, for example to add a database and api to modify rules.

Event Timeline

taavi triaged this task as Medium priority.Jul 7 2021, 5:54 PM
taavi removed taavi as the assignee of this task.Sep 18 2023, 5:36 PM

Not actively working on this.