Page MenuHomePhabricator

prometheus.puppet: enable experimental --storage.tsdb.retention.size flag and set on tools-prometheus-03
Closed, ResolvedPublic

Description

Write the description below

We are getting periodical out of space errors on tools-prometheus-03 (~1/week, unless it gets restarted, that also
happens relatively often).

One possible solution is to specify the retention by the available space instead of a time period, that means using the
option --storage.tsdb.retention.size.

This task is to:

  • parametrize the puppet classes and template needed to use that new flag
  • configure tools-prometheus-03 to use that flag instead

Helper dashboard: https://grafana-labs.wikimedia.org/goto/52qHqyXGk

Event Timeline

dcaro triaged this task as High priority.Apr 19 2021, 1:44 PM
dcaro created this task.

Change 681107 had a related patch set uploaded (by David Caro; author: David Caro):

[operations/puppet@production] prometheus: allow using the --storage.tsdb.retention.size option

https://gerrit.wikimedia.org/r/681107

Change 681107 merged by David Caro:

[operations/puppet@production] prometheus: allow using the --storage.tsdb.retention.size option

https://gerrit.wikimedia.org/r/681107