@CDanis noticed that ats-be is considering all requests to `/check` as internal health-checks, regardless of what the `Host` header says. While looking at this problem I've seen a bunch of related issues that should be fixed.
There are three different type of diagnostic requests that Varnish historically used to handle. The first were requests issued by `varnish-fe` to `varnish-be` to check the cache backend layer status. They look like this: `http://varnishcheck/check`. The second are checks coming from PyBal and directed, logically speaking, to varnish-fe. PyBal requests target `https://varnishcheck.wikimedia.org/from/pybal`: they go through the TLS termination layer (once nginx, now ats-tls) and then hit `varnish-fe`. The third are Icinga requests targeting `varnishcheck/wikimedia-monitoring-test` and hitting varnish-fe directly. Confusingly, our VCL handles all these requests as follows in `wm_recv_early`. This VCL code used to be shared by both Varnish frontends and backends, and worked for both given that the `Host` header does indeed start with "varnishcheck" in the case of `http://varnishcheck/check`, `https://varnishcheck/wikimedia-monitoring-test` and `https://varnishcheck.wikimedia.org/from/pybal`:
```
if ( req.http.host ~ "^varnishcheck" ) {
return (synth(200, "healthcheck"));
}
```
To make things less confusing:
[x] ats-be should only consider requests to `//varnishcheck/check` as health-checks performed by the frontend layer, ie check the `Host` header in addition to the request URI
[] varnish-fe should only consider requests to `//varnishcheck.wikimedia.org/from/pybal` and ``//varnishcheck.wikimedia.org/wikimedia-monitoring-test``, ie ensure that the Host header is exactly varnishcheck.wikimedia.org, and that the request URI is `/from/pybal` or `/wikimedia-monitoring-test`
Once the above has been done as proof that we do understand the status quo, we then should make the world more intuitive and self-explanatory:
[] We should transition to specific URIs for healthchecks, with the path component named after the intended //recipient// of the heathcheck, and a more generic and self-explanatory hostname. For example: `http://healthcheck.wikimedia.org/varnishfe`, `https://healthcheck.wikimedia.org/ats-tls`, etc.
* This makes it much more obvious what's going on, is more future-proof, and allows interesting possibilities like sending a query to ats-tls to check if the stack is selecting healthy varnishfes or ats-bes.
* These healthcheck queries should probably never produce TLS redirects?
[] Currently, on a healthcheck query ats-be adds `bug` as the status indicator to `X-Cache`/`X-Cache-Status`. We should match historical precedent here and use `int` (for "internal")
Related, but out of scope for this ticket is T251301.