Page MenuHomePhabricator

Jenkins jobs leaving workdirs, filling disk on integration-agent-docker-1041
Closed, DuplicatePublic

Description

I took integration-agent-docker-1041 offline due to disk space issues on /srv

There were three jobs running there when I marked offline, but there are 25 workspace directories, the oldest from 8 Feb:

root@integration-agent-docker-1041:/srv/jenkins/workspace# ls -lhA
total 84K
drwxr-xr-x 18 jenkins-deploy wikidev 4.0K Apr  2 11:58 alerts-pipeline-test
drwxr-xr-x  2 jenkins-deploy wikidev 4.0K Apr  2 11:58 alerts-pipeline-test@tmp
drwxr-xr-x 10 jenkins-deploy wikidev 4.0K Mar  5 13:16 cxserver-pipeline-test
drwxr-xr-x  2 jenkins-deploy wikidev 4.0K Mar  5 13:16 cxserver-pipeline-test@tmp
drwxr-xr-x  9 jenkins-deploy wikidev 4.0K Mar 28 07:54 inference-services-pipeline-article-country-publish
drwxr-xr-x  2 jenkins-deploy wikidev 4.0K Mar 28 07:54 inference-services-pipeline-article-country-publish@tmp
drwxr-xr-x  9 jenkins-deploy wikidev 4.0K Mar 28 11:03 inference-services-pipeline-edit-check
drwxr-xr-x  2 jenkins-deploy wikidev 4.0K Mar 28 11:03 inference-services-pipeline-edit-check@tmp
drwxr-xr-x  9 jenkins-deploy wikidev 4.0K Mar 26 09:04 inference-services-pipeline-revertrisk-multilingual
drwxr-xr-x  2 jenkins-deploy wikidev 4.0K Mar 26 09:13 inference-services-pipeline-revertrisk-multilingual@tmp
drwxr-xr-x  9 jenkins-deploy wikidev 4.0K Mar 31 15:37 inference-services-pipeline-revertrisk-wikidata
drwxr-xr-x  2 jenkins-deploy wikidev 4.0K Mar 31 15:42 inference-services-pipeline-revertrisk-wikidata@tmp
drwxr-xr-x  9 jenkins-deploy wikidev 4.0K Mar 26 09:04 inference-services-pipeline-revscoring
drwxr-xr-x  2 jenkins-deploy wikidev 4.0K Mar 26 09:04 inference-services-pipeline-revscoring@tmp
drwxr-xr-x  9 jenkins-deploy wikidev 4.0K Mar  5 13:16 machinetranslation-pipeline-test
drwxr-xr-x  2 jenkins-deploy wikidev 4.0K Mar  5 13:17 machinetranslation-pipeline-test@tmp
drwxr-xr-x  5 jenkins-deploy wikidev 4.0K Apr  3 19:57 quibble-composer-mysql-php74
drwxr-xr-x  5 jenkins-deploy wikidev 4.0K Apr  3 19:58 quibble-composer-mysql-php82
drwxr-xr-x  5 jenkins-deploy wikidev 4.0K Feb  8 15:54 quibble-composer-mysql-php83@2
drwxr-xr-x  3 jenkins-deploy wikidev 4.0K Mar 11 08:31 wikidata-query-rdf-maven-release@tmp
drwxr-xr-x  3 jenkins-deploy wikidev 4.0K Feb 17 15:49 wikimedia-event-utilities-maven-release@tmp

We have a postbuild step on most docker jobs to wipe the workspace directories so these should all be empty. But there are no jobs running there and these workspaces are not empty:

root@integration-agent-docker-1041:/srv/jenkins/workspace# du -ch ./* | sort -rh | head       
24G     total
9.7G    ./quibble-composer-mysql-php74
9.3G    ./quibble-composer-mysql-php82
8.6G    ./quibble-composer-mysql-php82/src
8.6G    ./quibble-composer-mysql-php74/src
6.8G    ./quibble-composer-mysql-php82/src/extensions
6.8G    ./quibble-composer-mysql-php74/src/extensions
4.9G    ./quibble-composer-mysql-php83@2
4.8G    ./quibble-composer-mysql-php83@2/src
3.4G    ./quibble-composer-mysql-php83@2/src/extensions

Event Timeline

4.9G ./quibble-composer-mysql-php83@2
drwxr-xr-x 5 jenkins-deploy wikidev 4.0K Feb 8 15:54 quibble-composer-mysql-php83@2

My guess is that build got interrupted/aborted and the step that is supposed to clear the workspace did not run. I have looked at that previously with T299995, T352319 and deployed patches but I guess it is not catching everything. Maybe it is a left over.

The *pipeline* jobs are PipelineLib groovy jobs not fully clearing the workspace properly.

drwxr-xr-x 5 jenkins-deploy wikidev 4.0K Apr 3 19:57 quibble-composer-mysql-php74
drwxr-xr-x 5 jenkins-deploy wikidev 4.0K Apr 3 19:58 quibble-composer-mysql-php82
9.7G ./quibble-composer-mysql-php74
9.3G ./quibble-composer-mysql-php82

Looks like ongoing jobs and 10G is pretty much "normal". The instances /srv is 44G which IIRC I have sized as 4G for git repo mirrors + 4 executors * 10GB per build.

My guess is the instances could be rebuild with larger disks. A timer to clear the old workspaces would help as well.