Capacity planning/estimation for Thanos
Open, Needs TriagePublic
Actions

Assigned To

None

Authored By

	fgiunchedi
	Feb 16 2024, 7:54 AM

Description

CapEx time is upon us, this task will track the following:

Capacity estimation/planning for Thanos needs in terms of object storage, also in light of T351927
Estimation of thanos compact space needs, which brings titan disk space utilization close to maximum at the moment, so we'll likely need to add some capacity there too

Object storage requirement estimation

Thanos data is written by Prometheus in the form of raw datapoints blocks to object storage, each block represents a few hours of data. The raw data blocks are then downsampled to lower resolutions (5m, 1h) and written back to storage. All hours-long blocks (all resolutions) are also compacted into 14 days blocks for space savings. To each resolution we then apply a retention policy to delete older blocks.

Due to storage space pressure, in T351927 we have implemented additional logic to block cleanup: namely to take into account the fact that Prometheus in eqiad and codfw is replicated (two Prometheus hosts per each site), thus we also have blocks of very similar data which can be deleted if need be. We did do the deletion for blocks older than 3 months, hence I'll be considering blocks newer than that below to not account for the extra deletion.

Default retention strategy

This is the easiest and the strategy implemented by Thanos: we only delete blocks when they are too old (i.e. past their retention period).

For the last ~two months we get the following usage:

# days	GBs	resolution	GB/day
76	11143	0s (raw)	146
73	8433	5m	115
71	1595	1h	22

Extrapolating from that we get:

Current retention

This is the retention policy we have configured in Puppet as of today.

# weeks	GBs	resolution
54	55188	0s
270	217350	5m
270	41580	1h

Yielding a grand total of ~314TB needed. Thanos storage size is ~130 TB total, meaning we'd need to more than double the capacity (!) not a great situation.

Proposed retention and hardware needs

As a reasonable compromise I think we can do the following: keep 0s and 5m data for slightly longer than a year (so year over year comparisons are possible) so about 60w, and have 1h data for longer since it is significantly less expensive to keep. In other words (rounding up numbers)

# weeks	GBs	resolution
60	~62000	0s
60	~50000	5m
280	~43000	1h

Or ~155TB total, meaning we need to add about 30-40TB to current Thanos storage.

In terms of hardware this translates in an additional two hosts of the 24x 8TB class which will provide plenty of headroom (an additional ~100TB usable). We could also probably get away with two hosts of the 12x 4TB class (i.e. what thanos-be is now) though that wouldn't provide very much headroom.

Titan hosts storage

The titan hosts run block compaction processes described above, and require temporary space to write the compacted blocks to disk before upload. The hosts have been managing though they occasionally get tight on disk space, for this reason we should procure additional SSD to install on these to get ahead of the curve.

Hardware needs

We'll need 2x SSD per host (across 4x hosts) so total 8x SSD of 500GB capacity or greater to install in already exists hosts.

Related Objects

Mentioned In: T351927: Decide and tweak Thanos retention
Mentioned Here: T351927: Decide and tweak Thanos retention

Event Timeline

fgiunchedi created this task.Feb 16 2024, 7:54 AM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptFeb 16 2024, 7:54 AM

lmata added a project: SRE Observability (FY2023/2024-Q3).Feb 16 2024, 3:04 PM

lmata subscribed.

fgiunchedi updated the task description. (Show Details)Feb 21 2024, 8:43 AM

fgiunchedi updated the task description. (Show Details)Feb 21 2024, 8:47 AM

fgiunchedi updated the task description. (Show Details)Feb 21 2024, 8:51 AM

fgiunchedi updated the task description. (Show Details)

cc @MatthewVernon and SRE-swift-storage for your input re: capacity planning and hardware needs for thanos-be, let me know what you think!

I think the proposed table should look like this?

# weeks	GBs	resolution
60	~62000	0s
60	~50000	5m
280	~43000	1h

I.e. 60W (as per text, a bit over a year), not 50W as you currently have? My back-of-an-envelope calculation has the GBs figures about right, though, so I don't think it changes the thrust of your argument.

I think on your numbers two 12x4 systems would be likely insufficient (or at least cutting it fine, which I'd rather not do), but two 24x8 systems would be good. It might be worth moving them to the new-style disk usage we have for recent ms-be* nodes too? i.e. JBOD rather than a set of 1-disk RAID-0 arrays. I bring that up because it changes how DC-ops configure the nodes, so it's worth remembering when ordering hw.

Obviously if you think there's value in continuing with the current retention policy, there's no reason we couldn't do that beyond budget ( :-) ), but I get the impression you don't.

MatthewVernon added a project: SRE-swift-storage.Feb 21 2024, 10:37 AM

fgiunchedi updated the task description. (Show Details)Feb 22 2024, 9:47 AM

In T357747#9562810, @MatthewVernon wrote:

I think the proposed table should look like this?

# weeks GBs resolution

60 ~62000 0s

60 ~50000 5m

280 ~43000 1h

I.e. 60W (as per text, a bit over a year), not 50W as you currently have? My back-of-an-envelope calculation has the GBs figures about right, though, so I don't think it changes the thrust of your argument.

Thank you, I've fixed the table to read 60w instead.