Action plan
- Update cr firewall rules to allow this traffic (T343885#9119388)
- Add a new profile::prometheus::cloud class (and rename labs to cloud in the process)
- Allocate space on prometheus LVs (eqiad only) (modules/prometheus/files/provision-fs.sh)
- Deploy said class to prometheus hosts (eqiad only), making sure alertmanagers parameter is not set so alerts won't be sent.
- Verify Prometheus can pull metrics from all its jobs.
- Let metrics data accumulate for TBD days
- Enable alerts to be sent from prometheus, disable alerts from cloudmetrics
- Audit accesses on cloudmetrics hosts for Prometheus clients
- Update cloudmetrics references to point to prometheus.svc/cloud instead
Followups
- Add thanos upload/sidecar
- Remove labs Prometheus instance from cloudmetrics
- Duplicate setup to codfw? T350010
- Rename grafana datasource to eqiad prometheus/cloud. These dashboards will need updating: https://phabricator.wikimedia.org/P53026 . T350013