Gerrit is behind the CDN soon (T411895) so the tcp-proxy VMs are a production dependency. We should make sure this instances are monitored properly. We need at least:
- haproxy exporter (already running on :9422)
- Proper dashboards for the haproxy instances -> https://grafana-rw.wikimedia.org/d/459365f6-df37-48d6-8142-82b22c1875e7/gerrit-tcp-proxy
- Some basic alerts -> https://gerrit.wikimedia.org/r/1236746
- when we lose more than one(?) tcp-proxy in a DC
- backend unavailable (requires additional silences during gerrit maintenance)
-
high error rate? - existing blackbox checks