Page MenuHomePhabricator

Delete "servers" metrics in graphite older than 60d
Closed, DeclinedPublic

Description

There's isn't a lot of value I think after 60d we've decom or rename a machine to keep its system level metrics, ATM those account for ~30% of all servers metrics:

graphite1001:/var/lib/carbon/whisper/servers$ find . -type f -mtime +60 -ls | wc -l
240279
graphite1001:/var/lib/carbon/whisper/servers$ find . -type f -ls | wc -l
688898

We could clean that hierarchy for older than 60d metrics, thoughts?

Event Timeline

+1 for pruning these for servers that no longer exist. Will also make autocompletion in Grafana easier by not constantly offering completion options for metric names that don't have current values.

Aren't we collecting all server metrics via prometheus? If that's the case, shouldn't we just drop the diamond collector for those metrics?

Change 364687 had a related patch set uploaded (by Filippo Giunchedi; owner: Filippo Giunchedi):
[operations/puppet@production] graphite: keep 'servers' hierarchy for 60d

https://gerrit.wikimedia.org/r/364687

@Joe yes we are but stopping using diamond collector for server data isn't in scope for this task

Change 364687 abandoned by Filippo Giunchedi:
graphite: keep 'servers' hierarchy for 60d

Reason:
Chatted with Faidon about this on IRC, 60d is too aggressive for server and we might need the metrics later for long term trends

https://gerrit.wikimedia.org/r/364687

We'll need to have a different strategy for 'servers' for e.g. long term trending