Page MenuHomePhabricator

labmon1001 disk filling up
Closed, ResolvedPublic

Description

$ df -h
Filesystem                       Size  Used Avail Use% Mounted on
udev                              10M     0   10M   0% /dev
tmpfs                             13G  1.4G   12G  11% /run
/dev/md0                          92G  5.1G   82G   6% /
tmpfs                             32G     0   32G   0% /dev/shm
tmpfs                            5.0M     0  5.0M   0% /run/lock
tmpfs                             32G     0   32G   0% /sys/fs/cgroup
/dev/mapper/labmon1001--vg-data  2.1T  1.9T  109G  95% /srv

The bulk of the consumed disk space (1.6Tb) is in /srv/carbon/whisper/archived_metrics. Probably we need some way to periodically clean this up.

Event Timeline

For now, I am going to delete all metrics more than 2 years old:

find . -mtime +730 -type f -delete

That didn't free up a ton of space, but enough to last us a few weeks.

What's a useful timespan to save? 1 year? 6 months? 1 month?

I would think a year is a first good step.

I would think a year is a first good step.

+1

Change 360869 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] Graphite labs archiver: Only save metrics for a year.

https://gerrit.wikimedia.org/r/360869

Change 360869 merged by Andrew Bogott:
[operations/puppet@production] Graphite labs archiver: Only save metrics for a year.

https://gerrit.wikimedia.org/r/360869

I've installed a cleanup cron on labmon1001. I'm going to give it a day and make sure that things are properly cleaned up, then this can be closed.

Today, usage is:

$ du -h -d0 /srv/carbon/whisper/archived_metrics/
1.5T /srv/carbon/whisper/archived_metrics/

That should drop by about 50% tomorrow.

Andrew claimed this task.

du -h -d0 /srv/carbon/whisper/archived_metrics/

897G /srv/carbon/whisper/archived_metrics/

Change 424597 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/puppet@production] wmcs: monitoring: purge archived metrics every 90 days

https://gerrit.wikimedia.org/r/424597

Change 424597 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] wmcs: monitoring: purge archived metrics every 90 days

https://gerrit.wikimedia.org/r/424597