Page MenuHomePhabricator

Eventstreams graphite disk usage
Closed, ResolvedPublic

Description

I noticed eventstreams using a significant amount of disk space on graphite, with ~half of rdkafka metrics being more than 10d old and not updated. @Ottomata anything we could do here like aggregating in a different way or purge old metrics?

root@graphite1001:/var/lib/carbon/whisper/eventstreams# find rdkafka/ -type f -mtime +10 | wc -l
239220
root@graphite1001:/var/lib/carbon/whisper/eventstreams# find rdkafka/ -type f  | wc -l
518155

161G eventstreams

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMar 16 2017, 3:56 PM
fgiunchedi updated the task description. (Show Details)Mar 16 2017, 3:57 PM

Yar, this is because many of the metrics are per-client. I'd like to know if clients start lagging, and there's not a real way to aggregate that.

But, we really don't need to keep history of this data. Can we delete certain data > 2 weeks old?

Yar, this is because many of the metrics are per-client. I'd like to know if clients start lagging, and there's not a real way to aggregate that.

But, we really don't need to keep history of this data. Can we delete certain data > 2 weeks old?

Yes we could for sure, we do something similar for instances hierarchy already

Change 343609 had a related patch set uploaded (by Filippo Giunchedi):
[operations/puppet] graphite: cleanup eventstreams rdkafka stale data

https://gerrit.wikimedia.org/r/343609

Nuria moved this task from Incoming to Radar on the Analytics board.Mar 20 2017, 3:48 PM

Change 343609 merged by Filippo Giunchedi:
[operations/puppet@production] graphite: cleanup eventstreams rdkafka stale data

https://gerrit.wikimedia.org/r/343609

fgiunchedi closed this task as Resolved.Mar 22 2017, 7:17 PM
fgiunchedi claimed this task.

eventstreams is at 110G and cleaned up periodically, good enough for now

fgiunchedi reopened this task as Open.Jun 21 2017, 2:21 PM

Reopening, beginning at around 6/6 eventstreams has been creating a lot of metrics consuming ~20% of graphite disk space in 8 days and it is now at around 400G

We're cleaning metrics older than 15d already but that doesn't seem to be enough with a big influx of metrics like this

fgiunchedi removed fgiunchedi as the assignee of this task.Jun 21 2017, 2:21 PM
fgiunchedi triaged this task as High priority.
fgiunchedi removed a project: Patch-For-Review.
elukey added a subscriber: elukey.Jun 25 2017, 9:02 AM

Change 361818 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] graphite: lower down eventstreams whisper files retention

https://gerrit.wikimedia.org/r/361818

Change 361818 merged by Elukey:
[operations/puppet@production] graphite: lower down eventstreams whisper files retention

https://gerrit.wikimedia.org/r/361818

fgiunchedi closed this task as Resolved.Jul 7 2017, 10:25 AM
fgiunchedi claimed this task.

Resolving this for now, will reopen if necessary

fgiunchedi reopened this task as Open.Aug 29 2017, 7:45 AM

Reopening, rdkafka metrics for eventstreams is out of control since a couple of days

graphite1001:~$ du -hcs /var/lib/carbon/whisper/eventstreams/rdkafka
590G	/var/lib/carbon/whisper/eventstreams/rdkafka

Ideally the metrics pushed are not so many, we should get more aggressive with the cleaning too, maybe 5 days (from 10 days now).

fgiunchedi added a project: observability.

Change 374500 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] role::graphite::production: lower down eventstreams rdkafka retention

https://gerrit.wikimedia.org/r/374500

Change 374500 merged by Elukey:
[operations/puppet@production] role::graphite::production: lower down eventstreams rdkafka retention

https://gerrit.wikimedia.org/r/374500

elukey added a comment.EditedAug 29 2017, 8:01 AM

The other step to take would be to limit the amount of data that we store for librkafka, because with so many clients it is impossible to keep track of all the metrics (https://grafana.wikimedia.org/dashboard/db/eventstreams doesn't even load anymore).

fgiunchedi closed this task as Resolved.Oct 15 2018, 2:22 PM

We're doing good space wise now:

# du -hcs /var/lib/carbon/whisper/eventstreams/
4.8G	/var/lib/carbon/whisper/eventstreams/
Aklapper edited projects, added Analytics-Radar; removed Analytics.Jun 10 2020, 6:44 AM