We have a few hostnames in `hieradata/common/profile/trafficserver/backend.yaml` that should be moved to discovery records for easier operations (e.g. reimage/flip/etc).
The ultimate goal is to simplify operations wrt the current status quo for each service.
Namely:
```
target: http://grafana-rw.wikimedia.org
replacement: https://grafana1002.eqiad.wmnet
target: http://grafana-next-rw.wikimedia.org
replacement: https://grafana2001.codfw.wmnet
target: http://grafana.wikimedia.org
replacement: https://grafana1002.eqiad.wmnet
target: http://grafana-next.wikimedia.org
replacement: https://grafana2001.codfw.wmnet
target: http://logstash.wikimedia.org
replacement: https://kibana7.svc.eqiad.wmnet
target: http://prometheus-eqiad.wikimedia.org
replacement: https://prometheus1005.eqiad.wmnet
target: http://prometheus-codfw.wikimedia.org
replacement: https://prometheus2005.codfw.wmnet
target: http://pyrra.wikimedia.org
replacement: http://titan1001.eqiad.wmnet
target: http://slo.wikimedia.org
replacement: http://titan1001.eqiad.wmnet
target: http://slos.wikimedia.org
replacement: http://titan1001.eqiad.wmnet
```
Since different services require different strategies, the following sections outline the trade offs and solutions on a per-service basis.
==== grafana
This is the trickiest of all I think, ideally I (Filippo) would like a single patch or command to flip the active/standby grafana host. Note that whatever points to grafana.w.o should be also reflected in `profile::grafana::active_host` (and `profile::grafana::standby_host`) for the "singleton" units (such as syncing ldap users) to follow.
A pontential solution could look like this:
* introduce `grafana.discovery.wmnet` being a CNAME to the active host (done)
* the trafficserver configuration above points to `grafana.discovery.wmnet` (done)
* change the puppet logic to detect the active host so that `grafana.discovery.wmnet` gets resolved, and if it points to the same address as the host puppet is running host, then we're on the active host, otherwise we're in standby host
* we also need to make sure we can serve grafana / grafana-next from any grafana host (right now we need to change `profile::grafana::domain` and `profile::grafana::domainrw` between codfw and eqiad when we flip). This should be doable by moving to an implementation where we have a set of "base" names (grafana, grafana-next) and then the redirect apache rules handle a list of such names and their redirect (basically redirect to base + "-rw" as needed)
==== logstash
This ties in with moving the read path for logs, moving to a `confctl` controlled `discovery.wmnet` record would make flipping datacenters for logstash to be quicker and in line with other services too. What do you think @colewhite?
==== prometheus
We need to point to individual hosts because we're using `mod_auth_cas`. When we move to oauth2-proxy for SSO authentication then we can replace those with prometheus.svc.SITE.wmnet. This is basically T326657
==== pyrra (includes slo/slos)
I believe we could point these to `thanos-query.discovery.wmnet` right away; what do you think @herron ?