Page MenuHomePhabricator

[toolsdb] Add filesystem space alerts
Closed, ResolvedPublic

Description

As a followup to T409244: Toolforge outage: toolsdb out of space we must have alerts for tools(db) hosts running out of filesystem space. Said alerts could have the following structure:

  1. Predictive ("host will run out of space in X hours/days") alerts, at critical level
  2. Pages for a selection of hosts (e.g. toolsdb) when a threshold is met (e.g. 15%)

The idea being that operators first get alerted via predictive alerts; for some/all hosts if that goes unaddressed then a page is issued eventually if the trend continues.

Details

Related Changes in GitLab:
TitleReferenceAuthorSource BranchDest Branch
toolsdb: Add disk space alertsrepos/cloud/toolforge/alerts!54fnegrimain-Ib2dd93c9988584b6acbe0cdd4427fbaeafd80ce9main
Customize query in GitLab

Event Timeline

That sounds good to me yes, it's the similar to the other space.

The predictive can be as silly as just 'when it hits 20%', if it's too flaky to try to predict (ceph has proven not very easy to predict, as it grows/shrinks often). Happy to make it project in the future if it's reliable enough.

fnegri triaged this task as High priority.Nov 10 2025, 9:17 AM
fnegri renamed this task from Add filesystem space alerts for tools(db) to [toolsdb] Add filesystem space alerts.Nov 27 2025, 2:57 PM
fnegri changed the task status from Open to In Progress.Dec 5 2025, 5:09 PM