Page MenuHomePhabricator

Investigate or create alerting/messaging around when an NFS filesystem is ready for a cleanup
Closed, DuplicatePublic

Description

In order to be a bit more proactive on NFS management, it would be good if something suggests to us that we really ought to start work on reducing dumps, tools NFS usage and similar things before an actual critical alarm goes off.

It may be that the standing definition of "critical" in icinga will do or that a lower status will at least ping us in IRC, but the purpose of this task is to ensure that is true or implement another mechanism.

There may even be a way to autogenerate a task or something.

Event Timeline

Bstorm triaged this task as Medium priority.Mar 21 2019, 6:26 PM
Bstorm created this task.

We recently saw that this is actually quite awful and inaccurate coming out of icinga.

Change 578545 had a related patch set uploaded (by Jhedden; owner: Jhedden):
[operations/puppet@production] prometheus: fix int format in node_directory_size

https://gerrit.wikimedia.org/r/578545

Change 578545 merged by Jhedden:
[operations/puppet@production] prometheus: fix int format in node_directory_size

https://gerrit.wikimedia.org/r/578545