Zuul will be using the Magnum provisioned Kubernetes cluster in the "zuul" Cloud VPS project as the compute environment for its jobs. We will want some monitoring of the cluster to help us with capacity planning as well as incident response.
Magnum has feature flags for deploying various monitoring components: https://docs.openstack.org/magnum/latest/user/monitoring.html. This can be scaled up all the way to having a prometheus instance with alertmanager and grafana. Cloud VPS has a prometheus+grafana+alertmanager stack itself that can be used for various monitoring tasks. That system requires quite a bit of manual work to go beyond basic instance metric collection (some detail in T315695: Add basic MediaWiki/web site up alerting to the Beta Cluster), so maybe we should start with a self-contained solution?