We have a few hostnames in `hieradata/common/profile/trafficserver/backend.yaml` that should be moved to discovery records for easier operations (e.g. reimage/flip/etc).
The ultimate goal is to simplify operations wrt the current status quo for each service.
Services:
- [] Grafana
- [x] Logstash
- [x] Pyrra
- [x] Prometheus
- [x] Thanos
Since different services require different strategies, the following sections outline the trade offs and solutions on a per-service basis.
==== grafana
This is the trickiest of all I think, ideally I (Filippo) would like a single patch or command to flip the active/standby grafana host. Note that whatever points to grafana.w.o should be also reflected in `profile::grafana::active_host` (and `profile::grafana::standby_host`) for the "singleton" units (such as syncing ldap users) to follow.
A pontential solution could look like this:
[x] introduce `grafana.discovery.wmnet` being a CNAME to the active host
[x] the trafficserver configuration above points to `grafana.discovery.wmnet`
[] change the puppet logic to detect the active host so that `grafana.discovery.wmnet` gets resolved, and if it points to the same address as the host puppet is running host, then we're on the active host, otherwise we're in standby host
[] we also need to make sure we can serve grafana / grafana-next from any grafana host (right now we need to change `profile::grafana::domain` and `profile::grafana::domainrw` between codfw and eqiad when we flip). This should be doable by moving to an implementation where we have a set of "base" names (grafana, grafana-next) and then the redirect apache rules handle a list of such names and their redirect (basically redirect to base + "-rw" as needed)
In this scenario a codfw/eqiad grafana flip translates to a single DNS patch to move grafana.discovery.wmnet and grafana-next.discovery.wmnet as needed.
A DNS patch I (Filippo) is good enough for now given how infrequently we move grafana around, alternatively we can move `grafana.discovery.wmnet` to be controlled by conftool, in which case we can use `confctl --object-type discovery` to pool/depool
==== logstash
This ties in with moving the read path for logs, moving to a `confctl` controlled `discovery.wmnet` record would make flipping datacenters for logstash to be quicker and in line with other services too. What do you think @colewhite? - SGTM!
==== prometheus
We need to point to individual hosts because we're using `mod_auth_cas`. When we move to oauth2-proxy for SSO authentication then we can replace those with prometheus.svc.SITE.wmnet. This is basically T326657
==== pyrra (includes slo/slos) [done]
[x] point to `thanos-web.discovery.wmnet`