Investigate disk usage of integration-slave-jessie-1002
Closed, ResolvedPublic

Description

Shinken-wm keeps reporting to IRC:
Free space - all mounts on integration-slave-jessie-1002 is CRITICAL: CRITICAL: integration.integration-slave-jessie-1002.diskspace._mnt.byte_percentfree (No valid datapoints found)integration.integration-slave-jessie-1002.diskspace.root.byte_percentfree (<11.11%)
We need to figure out what's using up disk space and what exactly we can clean-up to prevent this error from returning.

Zppix created this task.Mon, Apr 10, 8:43 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMon, Apr 10, 8:43 PM

Yeah / is almost full:

$ df -h /
Filesystem      Size  Used Avail Use% Mounted on
/dev/vda3        19G   17G  1.1G  95% /

$ df -ih /
Filesystem     Inodes IUsed IFree IUse% Mounted on
/dev/vda3        1.2M  830K  387K   69% /

/tmp is full of files:

android-tmp-robolectric999080544390212506
android-tmp-robolectric999120214029539769
android-tmp-robolectric999406133185790303

Zppix added a comment.Mon, Apr 10, 8:49 PM

@hashar maybe its a good idea to invest in a script that auto cleans up /tmp?

Mentioned in SAL (#wikimedia-releng) [2017-04-10T20:49:46Z] <hashar> integration-slave-jessie-1002 : cleaning up /tmp: sudo find /tmp -path '/tmp/android-tmp-robo*' -delete # T162635

Mentioned in SAL (#wikimedia-releng) [2017-04-10T20:52:49Z] <hashar> integration-slave-jessie-1001 : cleaning up /tmp: sudo find /tmp -path '/tmp/android-tmp-robo*' -delete # T162635

Cleared out the /tmp .

There are multiple copies of the Android SDK on the slaves. I guess because its installation path got moved with time.

Among others in mega bytes:

3423 /mnt/home/jenkins-deploy/.android
1401 /mnt/home/jenkins-deploy/.android-sdk
684 /srv/jenkins-workspace/workspace/apps-android-wikipedia-publish/.sdk
684 /srv/jenkins-workspace/workspace/apps-android-wikipedia-test/.sdk

I think /mnt/home/jenkins-deploy/.android-sdk can go. They

The worse offender is:

3421 /srv/home/jenkins-deploy/.android/build-cache

That one could probably use garbage collection.

/mnt/home/jenkins-deploy/.android-sdk

Feel free to try this out by moving this directory to a temporary location and triggering our jobs. (I tried but appear to no longer have root access on the CI machines to switch user to jenkins-deploy.)

Mentioned in SAL (#wikimedia-releng) [2017-04-12T14:54:58Z] <hashar> integration-slave-jessie-1002 : mv /mnt/home/jenkins-deploy/.android-sdk /mnt/home/jenkins-deploy/.android-sdk.T162635.back for T162635

Mentioned in SAL (#wikimedia-releng) [2017-04-12T14:54:58Z] <hashar> integration-slave-jessie-1002 : mv /mnt/home/jenkins-deploy/.android-sdk /mnt/home/jenkins-deploy/.android-sdk.T162635.back for T162635

Mentioned in SAL (#wikimedia-releng) [2017-04-12T14:56:22Z] <hashar> integration-slave-jessie-1001 : mv /mnt/home/jenkins-deploy/.android-sdk /mnt/home/jenkins-deploy/.android-sdk.T162635.back for T162635

Mentioned in SAL (#wikimedia-releng) [2017-04-12T14:56:22Z] <hashar> integration-slave-jessie-1001 : mv /mnt/home/jenkins-deploy/.android-sdk /mnt/home/jenkins-deploy/.android-sdk.T162635.back for T162635

hashar added a comment.EditedWed, Apr 12, 3:02 PM

My guess is that it:

  • used to be shared between jobs and set in $HOME/.android-sdk ( /mnt/home/jenkins-deploy/.android-sdk )
  • the build somehow still point the cache to $HOME/.android
  • gradle uses $WORKSPACE/.sdk

It seems the path change occurred on Feb 9th based on last files touched in $HOME/.android-sdk.

i have renamed /mnt/home/jenkins-deploy/.android-sdk and ran apps-android-wikipedia-test which passed. So I nuked the backup directory.

Can we check what /mnt/home/jenkins-deploy/.android/build-cache is for? It is probably a bad thing to have it shared between builds and I am not sure whether it saves much time in the build anyway.

Mentioned in SAL (#wikimedia-releng) [2017-04-12T15:14:01Z] <hashar> rm -fR /mnt/home/jenkins-deploy/.android/build-cache/* # T162635

After I have deleted /mnt/home/jenkins-deploy/.android/build-cache I rebuild the job apps-android-wikipedia-test and it took 4 minutes 30 seconds which is the same build time as other builds. So most probably the build-cache does not bring any performance and we can disable it.

More on the build cache here:

Android Studio 2.2 Beta 3 introduces a new build cache feature that can speed up build times for clean builds by storing and reusing files/directories that were created in previous builds of the same or different Android project.
The build cache is meant to be used across all Android projects.

I think we want to keep caching enabled unless it's accruing too much space. On my local machine it's 172MB (I don't have permissions to check the server usage but assume it's similar). We currently disable pre-dexing for CI builds. If we do need to disable build caching, we should probably make the change in the same place as pre-dexing (app/builld.gradle of the Android app repo).

hashar closed this task as "Resolved".Fri, Apr 14, 1:35 PM
hashar claimed this task.

The build cache is way smaller now /mnt/home/jenkins-deploy/.android/build-cache so at least that is a thing. Lets keep it around :} There is plenty of disk available on both instances:

integration-slave-jessie-1001:~# df -h / /srv
Filesystem                          Size  Used Avail Use% Mounted on
/dev/vda3                            19G   11G  7.5G  58% /
/dev/mapper/vd-second--local--disk   21G   12G  7.3G  62% /srv
integration-slave-jessie-1002:~# df -h / /srv
Filesystem                          Size  Used Avail Use% Mounted on
/dev/vda3                            19G   12G  6.1G  66% /
/dev/mapper/vd-second--local--disk   21G   10G  9.1G  53% /srv

Thanks @Niedzielski !