This task tracks the deployment of Thanos stateless components, the big win being a query endpoint that can reach out to all prometheus instances and merge/deduplicate results as needed.
Outline of what's needed:
[ ] Thanos Debian package
[x] Prometheus instances need to advertise unique `external_labels`
- [x] Need to add labels: `instance` (or `name` or sth like that) plus `replica` (`A` or `B`)
- [x] The labels above need to be filtered out before ingestion by our `global` instance for backwards compatibility
[ ] Deploy Thanos sidecar alongside each Prometheus instance
- [ ] Needs two ports for each of http + grpc interfaces, likely as an offset of the instance's port itself
[ ] Deploy Thanos query component
- [ ] Decide on where to deploy, alongside existing Prometheus in codfw/eqiad ? Dedicated VMs ?
- [ ] Needs two ports for http+grpc, and labels that are considered for deduplication (`replica` in our case)
- [ ] Needs to locate and reach all other Thanos sidecars
- [ ] Needs redundancy and resiliency, deploy behind LVS and can be active/active (i.e. discovery DNS records)
[ ] Configure and test datasource in Grafana