We have Thanos hardware coming in (T249538 + T249539) and will need to deploy Thanos on it, more specifically:
- Thanos store gateway on frontends: this component exposes the underlying object storage as Thanos' standard StoreAPI. The query component (T233956: Deploy Thanos (long-term storage) stateless components: sidecar and query) will know about store(s) and query them too.
- Setup the object storage (i.e. Swift). This is the standard frontend/backend Swift deployment we already run in production. There will be Puppet changes required to existing swift classes to cater for additional clusters.
- Since this is a "green field" project, we should experiment with multi-region swift deployment. In other words the Swift cluster will span across eqiad and codfw, with 4 copies of data and read affinity.
- The object storage will need to be reachable by Thanos sidecar too (running on Prometheus hosts), thus the authenticated API will need to be available over TLS.
- Setup Thanos compactor. This component needs to run as a singleton and requires access to the object storage.
Reference diagrams (on gdocs now, to be published on wikitech once finalized)
Logical: https://docs.google.com/drawings/d/1FhE7_vBtqCao2qnDKUe9rNdNnoHT9NU-Fg815I-MIf0/edit?usp=sharing
Deployment: https://docs.google.com/drawings/d/1IBSrreH8UPXKMbRLF_-hW3kJiQ-gbTgQCXumOh6oN04/edit?usp=sharing