Page MenuHomePhabricator

audit graphite retention schemas
Closed, ResolvedPublic

Description

ATM our retention schema looks like this (https://github.com/wikimedia/operations-puppet/blob/production/manifests/role/graphite.pp#L43)

[default]
pattern = .*
retentions = 1m:7d,5m:30d,15m:1y,1h:5y

the 1h archive takes ~half the space, it'd be easy to drop that to 3y for example

# whisper-info carbon/agents/graphite1001-a/avgUpdateTime.wsp 
maxRetention: 157680000
xFilesFactor: 0.5
aggregationMethod: average
fileSize: 1170784

Archive 0
retention: 604800
secondsPerPoint: 60
points: 10080
size: 120960
offset: 64

Archive 1
retention: 2592000
secondsPerPoint: 300
points: 8640
size: 103680
offset: 121024

Archive 2
retention: 31536000
secondsPerPoint: 900
points: 35040
size: 420480
offset: 224704

Archive 3
retention: 157680000
secondsPerPoint: 3600
points: 43800
size: 525600
offset: 645184

Event Timeline

fgiunchedi claimed this task.
fgiunchedi raised the priority of this task from to Needs Triage.
fgiunchedi updated the task description. (Show Details)
fgiunchedi added a project: Grafana.
fgiunchedi subscribed.
fgiunchedi triaged this task as Medium priority.Apr 21 2015, 8:45 AM
fgiunchedi added a project: acl*sre-team.
fgiunchedi set Security to None.

e.g. dropping 1h retention from 5y to 2y yields ~27% decrease:

$ whisper-resize 5MinuteRate.wsp 1m:7d 5m:30d 15m:1y 1h:2y
Retrieving all data from the archives
Creating new whisper database: 5MinuteRate.wsp.tmp
Created: 5MinuteRate.wsp.tmp (855424 bytes)
Migrating data without aggregation...
Renaming old database to: 5MinuteRate.wsp.bak
Renaming new database to: 5MinuteRate.wsp
graphite1001:~$ ls -latr 5MinuteRate.wsp*
-rw-r--r-- 1 filippo wikidev 1170784 May 19 15:21 5MinuteRate.wsp.bak
-rw-r--r-- 1 filippo wikidev  855424 May 19 15:22 5MinuteRate.wsp

from irc, given the space shortage and the fact that cassandra metrics are heavy hitters on graphite we can reduce retention for those only for now cc @GWicke

so, another proposal after talking with @ori, rationale being that we're most interested in recent data for investigation purposes while older data we should retain less.
Difference between proposal 1 and 2 is retaining 1m data for 1d or 7d respectively and 5m for 7d or 14d.

# current
$ whisper-create.py old.wsp 1m:7d 5m:30d 15m:1y 1h:5y
Created: old.wsp (1170784 bytes)
# proposal 1
$ whisper-create.py new1.wsp 1m:1d 5m:7d 15m:30d 1h:1y
Created: new2.wsp (181216 bytes)
# proposal 2
$ whisper-create.py new2.wsp 1m:7d 5m:14d 15m:30d 1h:1y
Created: new.wsp (309088 bytes)

Change 212893 had a related patch set uploaded (by Filippo Giunchedi):
graphite: set a coarser aggregation policy to relieve storage pressure

https://gerrit.wikimedia.org/r/212893

Change 212893 merged by Filippo Giunchedi:
graphite: set a coarser aggregation policy to relieve storage pressure

https://gerrit.wikimedia.org/r/212893

we're now using a '1m:7d,5m:14d,15m:30d,1h:1y' retention policy