Cloudflare has published their software for linting / checking Prometheus alerts: https://github.com/cloudflare/pint . We'll need to evaluate it for our use cases and see if we get value out of it.
As of Feb 2023 we have implemented the following:
- CI checks for operations/alerts.git, based on pint
- runtime pint checks for instance-specific alerts (i.e. alerts files with deploy-tag (docs))
Still TODO:
- Add 'pint' support for global/thanos alerts
- Plan for non-instance-specific alerts (i.e. currently without deploy-tag)
-
Plan for cloudmetrics and pint checking(prometheus is moving off cloudmetrics)