As a followup to T409244: Toolforge outage: toolsdb out of space we must have alerts for tools(db) hosts running out of filesystem space. Said alerts could have the following structure:
- Predictive ("host will run out of space in X hours/days") alerts, at critical level
- Pages for a selection of hosts (e.g. toolsdb) when a threshold is met (e.g. 15%)
The idea being that operators first get alerted via predictive alerts; for some/all hosts if that goes unaddressed then a page is issued eventually if the trend continues.