Change Details

I noticed warnings for thanos-be hosts hovering at around 94%, it is time to decide and tweak Thanos retention. At the moment we have the following: ``` aggregation: retention time raw: 54w 5m: 270w 1h: 270w ``` Pending actions: * [ ] Analyze the data from `thanos tools bucket inspect` to get size breakdown (per instance, per aggregation, etc) * [ ] Come to a consensus on what to drop for now. An easy candidate is replicas (`a`, `b`) since most of the time we have both prometheus replicas up and running. ** After some analysis as of 2024-01-23 the cleanable replicated blocks for our biggest instance `ops` (older than 3 months) are as follows: (note that for `0s` we already cleaned up in the past, whereas `5m` and `1h` haven't been cleaned up beside the regular retention outlined above) | resolution | GBs | --- | --- | | 0s |1533 | 5m0s | 20856 | 1h0m0s | 2483 * [ ] Do capacity planning based on the retention we'd like ** As it stands we're averaging 400GB/day of new data, plus temporary spikes of ~2TB twice a week when the compactor runs and space gets freed afterwards: {F41710556} === Block cleanup strategies === In the short term there are "low hanging" fruits in terms of replicated data we can quickly delete and shed some storage space load/pressure. Currently in `eqiad` and `codfw` we have two identical Prometheus hosts in each site, configured the same and doing the same work. They upload their data blocks periodically to Thanos for long term storage, each block uploaded is labeled with `prometheus=<instance>` and `site=<site>` and `replica=[ab]` so we can identify the block source later when reading data and de-duplicating it as needed (`thanos-query` does this job). Note that `replica` label is intentionally abstracted from the hostname that uploaded the data, and can be (as of Jan 2024) either `a` or `b`. So far we have kept replicated data blocks until Thanos' retention time, this is helpful because e.g. on data missing from `replica=a` (e.g. during maint/reboot/etc) it is possible to read the missing data from `replica=b` and vice-versa. While this strategy works well, it also means we end up with a lot of basically-duplicated data for the most part. Thanos supports the so-called [[ https://thanos.io/tip/components/compact.md/#vertical-compactions | vertical compaction ]] to de-duplicate blocks and merge into one, however that is a more invasive change and comes with its own [[ https://thanos.io/tip/components/compact.md/#vertical-compaction-risks | caveats/risks ]]. For the purposes/scope of this task the focus will be on deleting duplicated blocks. The strategy implemented so far is the following: 1. Select all blocks older than three months, with `site=codfw` or `site=eqiad` and resolution `0s` (i.e. raw data) 1. Group blocks by their `prometheus` instance and start/end time 1. From each group, pick the block with fewer samples (i.e. less data) and mark it for deletion