Page MenuHomePhabricator

Alert in -releng when permanent hosts have low disk space
Closed, ResolvedPublic

Description

Let's stop having our alerting be people complaining of test failures.

Event Timeline

greg triaged this task as High priority.May 2 2018, 5:55 PM
greg created this task.
greg removed a project: Quibble.
greg added a subscriber: thcipriani.

Or, as @thcipriani just suggested, let's make a cronjob that deletes the workspaces before it becomes an issue, as that's what we do when it is an issue.

Looking at the RelEng SAL it seems like all we've been doing is`rm -rf /srv/jenkins-workspace/workspace/*`.

(( $(df --output=pcent /srv | awk -F '%' '!/Use/ {print $1}') > 95 )) && \
    rm -rf /srv/jenkins-workspace/workspace/*

We could cron that, or have a workspace cleanup job, or the jobs could clean-up after themselves (which might be the Right Thing). Adding @hashar to see if he has any preferences among the options here.

hashar assigned this task to thcipriani.

Done as part of T201224 . We now automatically depool the faulty slaves and have an IRC notification.