Page MenuHomePhabricator

Clean up old Docker images on deneb
Closed, ResolvedPublic

Description

/var/lib/docker on deneb is > 70G.

Event Timeline

Legoktm triaged this task as Unbreak Now! priority.Jul 23 2021, 1:04 AM
Legoktm created this task.

user homes over 1G:

1.4G	ema
2.6G	jbond
5.2G	jmm
5.9G	filippo
7.7G	razzi
7.9G	akosiaris
8.6G	elukey
41G	otto

Please see if something can be cleaned up

70G in /var/lib/docker...we have some pretty old images that can also be cleaned up from there

Mentioned in SAL (#wikimedia-operations) [2021-07-23T01:20:57Z] <legoktm> legoktm@deneb:~$ docker rmi docker-registry.wikimedia.org/mwcachedir:0.0.1 # T287222

Mentioned in SAL (#wikimedia-operations) [2021-07-23T01:20:57Z] <legoktm> legoktm@deneb:~$ docker rmi docker-registry.wikimedia.org/mwcachedir:0.0.1 # T287222

Freed up 6G. It should be saved in the registry in case anyone was using it.

Thanks for the ping! Freed stuff on my home dir:

elukey@deneb:~$ du -hs
136M	.
elukey lowered the priority of this task from Unbreak Now! to High.Jul 23 2021, 6:51 AM

Lowering down priority since we have now 22G available.

Cleared my home down to 1.2G. /home in total is down to 30G (and Alex is out), I'm retitling the task to trim the Docker data.

Mentioned in SAL (#wikimedia-operations) [2021-07-26T07:17:47Z] <_joe_> manage-production-images prune on deneb, T287222

MoritzMuehlenhoff renamed this task from deneb.codfw.wmnet root partition is full to Clean up old Docker images on deneb.Aug 2 2021, 7:02 AM
MoritzMuehlenhoff lowered the priority of this task from High to Medium.

I've done some cleaning in my home too, down to ~500M now.

Cleaned up some old istio/knative/kubeflow images, got down to this:

Data Space Used: 66.39GB
Data Space Total: 107.4GB
Data Space Available: 40.98GB

We should try to reduce the images even more, I got a build error today from docker-pkg stating:

[docker-pkg-build] ERROR - Build failed: devmapper: Thin Pool has 158763 free data blocks which is less than minimum required 163840 free data blocks. Create more free space in thin pool or use dm.min_free_space option to change behavior (image.py:205)

Failed to build an image today:

2021-09-13 09:51:42,980 [docker-pkg-build] ERROR - Build failed: devmapper: Thin Pool has 149221 free data blocks which is less than minimum required 163840 free data blocks. Create more free space in thin pool or use dm.min_free_space option to change behavior (image.py:205)
/dev/vda1       230G  166G   53G  76% /

I ran prune-production-images.service which was enough for my current needs:

/dev/vda1       230G  140G   78G  65% /

We also have a bunch of releng images there and they won't be cleaned up AIUI:

docker-registry.wikimedia.org/releng/composer-scratch                    1.10.22             645603d0c871        4 months ago        2MB
docker-registry.wikimedia.org/releng/ci-common                           0.4                 8dc9644f1942        6 months ago        807B
docker-registry.wikimedia.org/releng/maven                               3.5.2-1             09ea7f38823f        11 months ago       10.6MB
docker-registry.wikimedia.org/releng/npm-test-graphoid                   0.3.0-s2            9411a3651342        12 months ago       440MB
docker-registry.wikimedia.org/releng/npm-test-3d2png                     0.3.0-s2            9d74588b3ebf        12 months ago       494MB
docker-registry.wikimedia.org/releng/npm-test                            0.7.1-s1            1e032858f949        12 months ago       393MB
docker-registry.wikimedia.org/releng/npm                                 0.4.0-s1            37326b61fe59        12 months ago       393MB
docker-registry.wikimedia.org/releng/ci-jessie                           0.5.1-s1            6d297ee12d71        12 months ago       159MB
docker-registry.wikimedia.org/releng/composer-test-php56                 0.2.0-s2            e72dedfabcdd        16 months ago       322MB
docker-registry.wikimedia.org/releng/composer-php56                      0.2.0-s2            84a4c02ec933        16 months ago       322MB
docker-registry.wikimedia.org/releng/php56                               0.1.2               62335a14b59f        17 months ago       319MB
docker-registry.wikimedia.org/releng/ci-src-setup                        0.3.1-s3            0360e06f7692        23 months ago       326MB
docker-registry.wikimedia.org/releng/npm-test-librdkafka                 0.2.1-s1            0859dde0466b        2 years ago         400MB
# docker system df

TYPE                TOTAL               ACTIVE              SIZE                RECLAIMABLE
Images              194                 17                  52.23GB             51.16GB (97%)
Containers          31                  0                   9.356GB             9.356GB (100%)
Local Volumes       3235                0                   0B                  0B
Build Cache         0                   0                   0B                  0B

What about we just run docker system prune --force from time to time? I mean that would force images to be pulled from the registry again on rebuilds, but that might not be such a big deal in terms of build time.

My home directory's down to 2.5M now too :)

Mentioned in SAL (#wikimedia-operations) [2021-11-18T09:18:07Z] <jayme> systemctl start prune-production-images.service on deneb - T287222

Mentioned in SAL (#wikimedia-operations) [2022-01-12T19:22:33Z] <mutante> deneb - for some reason the "package builder clean up build directory"-service fails T287222

[deneb:~] $ sudo systemctl status  package_builder_Clean_up_build_directory.service
● package_builder_Clean_up_build_directory.service - Delete builds older the 2 weeks
   Loaded: loaded (/lib/systemd/system/package_builder_Clean_up_build_directory.service; static; vendor preset: enabled)
   Active: failed (Result: exit-code) since Wed 2022-01-12 19:21:40 UTC; 23s ago
  Process: 26013 ExecStart=/usr/bin/find /var/cache/pbuilder/build -type f -daystart -mtime +14 -delete (code=exited, status=1/FAILURE)
 Main PID: 26013 (code=exited, status=1/FAILURE)

Jan 12 19:21:39 deneb find[26013]: /usr/bin/find: cannot delete ‘/var/cache/pbuilder/build/cow.12043/proc/28098/fdinfo/1’: Operation not permitted
Jan 12 19:21:39 deneb find[26013]: /usr/bin/find: cannot delete ‘/var/cache/pbuilder/build/cow.12043/proc/28098/fdinfo/2’: Operation not permitted
Jan 12 19:21:39 deneb find[26013]: /usr/bin/find: cannot delete ‘/var/cache/pbuilder/build/cow.12043/proc/28098/fdinfo/3’: Operation not permitted
Jan 12 19:21:39 deneb find[26013]: /usr/bin/find: cannot delete ‘/var/cache/pbuilder/build/cow.12043/proc/28098/fdinfo/4’: Operation not permitted
Jan 12 19:21:39 deneb find[26013]: /usr/bin/find: cannot delete ‘/var/cache/pbuilder/build/cow.12043/proc/28098/status’: Operation not permitted
Jan 12 19:21:39 deneb find[26013]: /usr/bin/find: cannot delete ‘/var/cache/pbuilder/build/cow.12043/proc/28098/cmdline’: Operation not permitted
Jan 12 19:21:39 deneb find[26013]: /usr/bin/find: cannot delete ‘/var/cache/pbuilder/build/cow.12043/proc/28098/stat’: Operation not permitted
Jan 12 19:21:39 deneb find[26013]: /usr/bin/find: cannot delete ‘/var/cache/pbuilder/build/cow.12043/proc/28098/maps’: Operation not permitted
Jan 12 19:21:40 deneb systemd[1]: package_builder_Clean_up_build_directory.service: Main process exited, code=exited, status=1/FAILURE
Jan 12 19:21:40 deneb systemd[1]: package_builder_Clean_up_build_directory.service: Failed with result 'exit-code'.
jbond claimed this task.

I have cleaned this up, seems like an old build environment hadn't torne its self down properly. i have manually cleaned up