Page MenuHomePhabricator

deployment-jobrunner01/Free space - all mounts is CRITICAL
Closed, ResolvedPublic

Description

Disk space on the / partition of deployment-jobrunner01 triggers a >60% disk usage error from Shinken

deployment-jobrunner01:~$ df --human --type ext4
Filesystem                          Size  Used Avail Use% Mounted on
/dev/vda1                            18G   12G  5.3G  69% /

Event Timeline

du -d 1 -m /var/log/*|sort -rn|head -n5
1472	/var/log/hhvm
1297	/var/log/apache2
1021	/var/log/mediawiki
690	/var/log/account
425	/var/log/atop

Cleaned up stack traces from /var/log/hhvm.

/var/log/hhvm/error.log is not rotated. I emptied it since every errors were from before Feb 23 and got fixed.

Mentioned in SAL [2016-03-17T09:04:14Z] <hashar> Upgrading hhvm and related extensions on jobrunner01 T130179

Mentioned in SAL [2016-03-17T09:34:51Z] <hashar> deployment-jobrunner01 deleted /var/log/apache/*.gz T130179

Summary

The hhvm error.log is not rotated, I trimmed it

The Apache vhost for the jobrunner RPC was filling with errors as well as /var/log/mediawiki/jobrunner.log because some jobs were still for labswiki. I deleted the related keys from Redis T130184.

I think it is under control now :-}

Krenair assigned this task to hashar.