Today I was in the process of upgrading Thanos in T303154 and while depooling thanos-query kinda worked as expected (grafana kept the connections, to be investigated later) I noticed that thanos.w.o web interface did not switch. After a bit of digging I finally remembered about T151009: Provide authenticated access to Thanos native web interface which means web requests for thanos.w.o are backed thanos-sso.discovery.wmnet which is a CNAME to a single host (because sso sessions are not shared among hosts, at least as of Jul 2020)
The schema works fine, though it complicates failover and isn't intuitive with how the rest of thanos works. Therefore I think we should:
- Investigate whether nowadays we can share sso sessions, cc @jbond @Muehlenhoff as they would know. If that's possible/supported then we can flip back to thanos-query.discovery.wmnet to proxy thanos.w.o
- If the above isn't possible/desired, then move thanos-sso.discovery.wmnet to be another service IP with hashing based on client IP, this way we can confctl pool/depool like we do for thanos-swift and thanos-query