Shinken-wm keeps reporting to IRC:
Free space - all mounts on integration-slave-jessie-1002 is CRITICAL: CRITICAL: integration.integration-slave-jessie-1002.diskspace._mnt.byte_percentfree (No valid datapoints found)integration.integration-slave-jessie-1002.diskspace.root.byte_percentfree (<11.11%)
We need to figure out what's using up disk space and what exactly we can clean-up to prevent this error from returning.
Description
Event Timeline
Yeah / is almost full:
$ df -h / Filesystem Size Used Avail Use% Mounted on /dev/vda3 19G 17G 1.1G 95% / $ df -ih / Filesystem Inodes IUsed IFree IUse% Mounted on /dev/vda3 1.2M 830K 387K 69% /
/tmp is full of files:
android-tmp-robolectric999080544390212506
android-tmp-robolectric999120214029539769
android-tmp-robolectric999406133185790303
Mentioned in SAL (#wikimedia-releng) [2017-04-10T20:49:46Z] <hashar> integration-slave-jessie-1002 : cleaning up /tmp: sudo find /tmp -path '/tmp/android-tmp-robo*' -delete # T162635
Mentioned in SAL (#wikimedia-releng) [2017-04-10T20:52:49Z] <hashar> integration-slave-jessie-1001 : cleaning up /tmp: sudo find /tmp -path '/tmp/android-tmp-robo*' -delete # T162635
Cleared out the /tmp .
There are multiple copies of the Android SDK on the slaves. I guess because its installation path got moved with time.
Among others in mega bytes:
3423 /mnt/home/jenkins-deploy/.android
1401 /mnt/home/jenkins-deploy/.android-sdk
684 /srv/jenkins-workspace/workspace/apps-android-wikipedia-publish/.sdk
684 /srv/jenkins-workspace/workspace/apps-android-wikipedia-test/.sdk
I think /mnt/home/jenkins-deploy/.android-sdk can go. They
The worse offender is:
3421 /srv/home/jenkins-deploy/.android/build-cache
That one could probably use garbage collection.
/mnt/home/jenkins-deploy/.android-sdk
Feel free to try this out by moving this directory to a temporary location and triggering our jobs. (I tried but appear to no longer have root access on the CI machines to switch user to jenkins-deploy.)
Mentioned in SAL (#wikimedia-releng) [2017-04-12T14:54:58Z] <hashar> integration-slave-jessie-1002 : mv /mnt/home/jenkins-deploy/.android-sdk /mnt/home/jenkins-deploy/.android-sdk.T162635.back for T162635
Mentioned in SAL (#wikimedia-releng) [2017-04-12T14:54:58Z] <hashar> integration-slave-jessie-1002 : mv /mnt/home/jenkins-deploy/.android-sdk /mnt/home/jenkins-deploy/.android-sdk.T162635.back for T162635
Mentioned in SAL (#wikimedia-releng) [2017-04-12T14:56:22Z] <hashar> integration-slave-jessie-1001 : mv /mnt/home/jenkins-deploy/.android-sdk /mnt/home/jenkins-deploy/.android-sdk.T162635.back for T162635
Mentioned in SAL (#wikimedia-releng) [2017-04-12T14:56:22Z] <hashar> integration-slave-jessie-1001 : mv /mnt/home/jenkins-deploy/.android-sdk /mnt/home/jenkins-deploy/.android-sdk.T162635.back for T162635
My guess is that it:
- used to be shared between jobs and set in $HOME/.android-sdk ( /mnt/home/jenkins-deploy/.android-sdk )
- the build somehow still point the cache to $HOME/.android
- gradle uses $WORKSPACE/.sdk
It seems the path change occurred on Feb 9th based on last files touched in $HOME/.android-sdk.
i have renamed /mnt/home/jenkins-deploy/.android-sdk and ran apps-android-wikipedia-test which passed. So I nuked the backup directory.
Can we check what /mnt/home/jenkins-deploy/.android/build-cache is for? It is probably a bad thing to have it shared between builds and I am not sure whether it saves much time in the build anyway.
Mentioned in SAL (#wikimedia-releng) [2017-04-12T15:14:01Z] <hashar> rm -fR /mnt/home/jenkins-deploy/.android/build-cache/* # T162635
After I have deleted /mnt/home/jenkins-deploy/.android/build-cache I rebuild the job apps-android-wikipedia-test and it took 4 minutes 30 seconds which is the same build time as other builds. So most probably the build-cache does not bring any performance and we can disable it.
More on the build cache here:
Android Studio 2.2 Beta 3 introduces a new build cache feature that can speed up build times for clean builds by storing and reusing files/directories that were created in previous builds of the same or different Android project.
The build cache is meant to be used across all Android projects.
I think we want to keep caching enabled unless it's accruing too much space. On my local machine it's 172MB (I don't have permissions to check the server usage but assume it's similar). We currently disable pre-dexing for CI builds. If we do need to disable build caching, we should probably make the change in the same place as pre-dexing (app/builld.gradle of the Android app repo).
The build cache is way smaller now /mnt/home/jenkins-deploy/.android/build-cache so at least that is a thing. Lets keep it around :} There is plenty of disk available on both instances:
integration-slave-jessie-1001:~# df -h / /srv Filesystem Size Used Avail Use% Mounted on /dev/vda3 19G 11G 7.5G 58% / /dev/mapper/vd-second--local--disk 21G 12G 7.3G 62% /srv
integration-slave-jessie-1002:~# df -h / /srv Filesystem Size Used Avail Use% Mounted on /dev/vda3 19G 12G 6.1G 66% / /dev/mapper/vd-second--local--disk 21G 10G 9.1G 53% /srv
Thanks @Niedzielski !