Page MenuHomePhabricator

Figure out how to deploy metricsinfra Prometheus configuration tooling
Open, Needs TriagePublic


The metricsinfra Prometheus configuration management system consists of two parts:

  • the manager, which is a Flask application that exposes information from a Trove database using an API. It currently runs using uWSGI on a metricsinfra VM and is contacted via HAProxy.
  • the configurator, which is an agent-style program which contacts the manager API and generates configuration files based on that for prometheus and alertmanager. It is currently running via a systemd timer.

Both are simply git::cloned via Puppet with ensure => latest,, which isn't exactly ideal (Puppet patches are not in gerrit yet). The configurator gets its dependencies via Debian packages but configurator needs some virtualenv trickery (again via Puppet) as it uses packages not currently packaged for Debian. I'm not sure what's the best approach to deploy those, I'm leaning towards a Debian package for the configurator but am not sure about the manager. Scap deploys comes to my mind too, but I have very little experience with both of those options. This task is to figure out which options to use and implement them.

Event Timeline

Oh, also the manager will likely have a need to run some cron jobs too, for example to keep notification groups up to date with project membership. In theory Kubernetes would be ideal for that (as it can make sure a cron job runs at a timestamp on some node, without caring which node that is), but putting it in Toolforge isn't exactly ideal for reliability and separation reasons and putting up a whole Kubernetes cluster for this seems overkill.