deployment-jobrunner01/Free space - all mounts is CRITICAL
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	hashar
	Mar 17 2016, 8:55 AM

Description

Disk space on the / partition of deployment-jobrunner01 triggers a >60% disk usage error from Shinken

deployment-jobrunner01:~$ df --human --type ext4
Filesystem                          Size  Used Avail Use% Mounted on
/dev/vda1                            18G   12G  5.3G  69% /

Related Objects
Search...

		Status	Subtype	Assigned	Task
		Resolved		hashar	T130179 deployment-jobrunner01/Free space - all mounts is CRITICAL
		Resolved		hashar	T130184 beta cluster 'labswiki' not referenced in all-labs.dblist causing jobrunner to error out

Event Timeline

hashar created this task.Mar 17 2016, 8:55 AM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMar 17 2016, 8:55 AM

du -d 1 -m /var/log/*|sort -rn|head -n5
1472	/var/log/hhvm
1297	/var/log/apache2
1021	/var/log/mediawiki
690	/var/log/account
425	/var/log/atop

Cleaned up stack traces from /var/log/hhvm.

/var/log/hhvm/error.log is not rotated. I emptied it since every errors were from before Feb 23 and got fixed.

Mentioned in SAL [2016-03-17T09:04:14Z] <hashar> Upgrading hhvm and related extensions on jobrunner01 T130179

Mentioned in SAL [2016-03-17T09:34:51Z] <hashar> deployment-jobrunner01 deleted /var/log/apache/*.gz T130179

hashar created subtask T130184: beta cluster 'labswiki' not referenced in all-labs.dblist causing jobrunner to error out.Mar 17 2016, 10:05 AM

hashar closed subtask T130184: beta cluster 'labswiki' not referenced in all-labs.dblist causing jobrunner to error out as Resolved.Mar 17 2016, 10:29 AM

Summary

The hhvm error.log is not rotated, I trimmed it

The Apache vhost for the jobrunner RPC was filling with errors as well as /var/log/mediawiki/jobrunner.log because some jobs were still for labswiki. I deleted the related keys from Redis T130184.

I think it is under control now :-}

Krenair closed this task as Resolved.Apr 9 2016, 11:47 PM

Krenair assigned this task to hashar.

deployment-jobrunner01/Free space - all mounts is CRITICALClosed, ResolvedPublicActions

Description

Related ObjectsSearch...

Event Timeline

deployment-jobrunner01/Free space - all mounts is CRITICAL
Closed, ResolvedPublic
Actions

Related Objects
Search...