This task tracks the deployment of Thanos stateless components, the big win being a query endpoint that can reach out to all prometheus instances and merge/deduplicate results as needed.
Outline of what's needed:
- Thanos Debian package
- Prometheus instances need to advertise unique external_labels
- Need to add labels: instance (or name or sth like that) plus replica (A or B)
- The labels above need to be filtered out before ingestion by our global instance for backwards compatibility
- Deploy Thanos sidecar alongside each Prometheus instance (save for global)
- Needs two ports for each of http + grpc interfaces, likely as an offset of the instance's port itself
- Deploy Thanos query component
- Deploy on thanos-fe2* hosts
- Deploy on thanos-fe1* hosts
- Needs two ports for http+grpc, and labels that are considered for deduplication (replica in our case)
- Needs to locate and reach all other Thanos sidecars
- Deploy behind LVS and can be active/active (i.e. discovery DNS records) following https://wikitech.wikimedia.org/wiki/LVS#Add_a_new_load_balanced_service.
- Configure and test datasource in Grafana
- Audit and document how to port dashboards to Thanos (T256954)