The metricsinfra VM discovery logic needs to be updated, as Magnum breaks the assumption that all VMs run on a puppetized base image with node-exporter running.
Currently we use Prometheus's built-in OpenStack service discovery system, with a job per project. The list of projects is managed by prometheus-manager, which is configured to exclude the trove project which is our existing case of non-Puppetized instances.
To fix the issues, we would need to either
- figure out how to configure Prometheus's openstack_sd_config to exclude Magnum-managed instances, or
- figure out how to detect them in Python, and move the instance discovery to prometheus-manager (while still keeping the frequent update reate)