Page MenuHomePhabricator

Shorten Thanos retention
Closed, ResolvedPublic

Description

This task is also re: thanos-swift capacity planning, due to thanos itself storing more data and other projects also adding data (notably tegola/maps), the space on thanos-swift is growing at about ~13% over three months, and now at ~80%, meaning space will run out in ~a quarter.

Also related to thanos-swift and MOSS, there will be two hosts moved from thanos-swift to MOSS purposes to implement ceph, and thanos-swift replication factor will be moved from 4x to 3x.

To ease all the operations above I propose we move from current 270w retention (~5y) to 110w (~2y). The oldest data we have in Thanos is a little over 2y old (June 2020) so this will effectively "cap" the data at more or less the current usage (modulo organic growth)

Event Timeline

A sample of said utilization (on thanos-be2002, other hosts are similar)

2022-06-30-134613_1330x559_scrot.png (559×1 px, 180 KB)

I did a brief analysis on space vs retention vs resolution:

resolution#samples#seriesbytes
0s29.1B4B40TB
5m5.8B2.6B30TB
1h474.5M2.4B3.7TB

I'll start with raw retention capped to one year (~20TB)

Change 811932 had a related patch set uploaded (by Filippo Giunchedi; author: Filippo Giunchedi):

[operations/puppet@production] thanos: split retention times based on resolution

https://gerrit.wikimedia.org/r/811932

Change 811933 had a related patch set uploaded (by Filippo Giunchedi; author: Filippo Giunchedi):

[operations/puppet@production] thanos: trim raw samples retention to 54 weeks

https://gerrit.wikimedia.org/r/811933

Change 811932 merged by Filippo Giunchedi:

[operations/puppet@production] thanos: split retention times based on resolution

https://gerrit.wikimedia.org/r/811932

Change 811933 merged by Filippo Giunchedi:

[operations/puppet@production] thanos: trim raw samples retention to 54 weeks

https://gerrit.wikimedia.org/r/811933

Mentioned in SAL (#wikimedia-operations) [2022-07-11T08:06:24Z] <godog> trim thanos raw samples retention to 54w - T311690

Change 811933 merged by Filippo Giunchedi:

[operations/puppet@production] thanos: trim raw samples retention to 54 weeks

https://gerrit.wikimedia.org/r/811933

This is done on the Thanos side, in the sense that blocks have been marked for deletion by Thanos compactor. The blocks will actually be deleted from object storage today since the delete-delay option defaults to 48h

fgiunchedi changed the task status from Open to Stalled.Jul 18 2022, 7:48 AM

Space is freed now, and we are at ~73% bytes used overall. I'll stall the task and check back in 45/50 days to assess the situation again and act accordingly

Aklapper set Due Date to Sep 14 2022, 10:00 PM.Jul 18 2022, 5:30 PM

Mentioned in SAL (#wikimedia-operations) [2022-07-28T09:17:25Z] <Emperor> set thanos ring replicas to 3.95 T311690

fgiunchedi claimed this task.

Resolving since Thanos retention has been trimmed, more space is being freed as part of T314835: wdqs space usage on thanos-swift

Mentioned in SAL (#wikimedia-operations) [2022-09-05T10:55:19Z] <Emperor> set thanos ring replicas to 3.90 T311690

Mentioned in SAL (#wikimedia-operations) [2022-09-13T13:19:18Z] <Emperor> set thanos ring replicas to 3.85 T311690

Mentioned in SAL (#wikimedia-operations) [2022-09-17T12:17:30Z] <Emperor> set thanos ring replicas to 3.80 T311690

Mentioned in SAL (#wikimedia-operations) [2022-09-21T14:56:35Z] <Emperor> set thanos ring replicas to 3.75 T311690

Mentioned in SAL (#wikimedia-operations) [2022-10-10T08:28:56Z] <Emperor> set thanos ring replicas to 3.68 T311690

Mentioned in SAL (#wikimedia-operations) [2022-10-17T07:34:02Z] <Emperor> set thanos ring replicas to 3.60 T311690

Mentioned in SAL (#wikimedia-operations) [2022-10-24T08:35:27Z] <Emperor> set thanos ring replicas to 3.50 T311690

Mentioned in SAL (#wikimedia-operations) [2022-10-31T09:17:58Z] <Emperor> set thanos ring replicas to 3.40 T311690

Mentioned in SAL (#wikimedia-operations) [2022-11-07T09:08:00Z] <Emperor> set thanos ring replicas to 3.30 T311690

Mentioned in SAL (#wikimedia-operations) [2022-11-16T13:55:02Z] <Emperor> set thanos ring replicas to 3.20 T311690

Mentioned in SAL (#wikimedia-operations) [2022-11-23T09:16:32Z] <Emperor> set thanos ring replicas to 3.10 T311690

Mentioned in SAL (#wikimedia-operations) [2022-11-29T15:25:49Z] <Emperor> set thanos ring replicas to 3.0 T311690