Page MenuHomePhabricator

Run docker-gc on deploy servers
Closed, ResolvedPublic

Description

New container images are built on deploy1002 for every scap sync-world. The images have been accumulating and today deploy1002 currently has 111.5GB of images.

Run docker-gc on deploy servers to keep image storage within configured limits (suggest 20GB/30GB low/high water marks). See https://gerrit.wikimedia.org/r/plugins/gitiles/operations/puppet/+/refs/heads/production/modules/profile/manifests/gitlab/runner.pp#141 for example use.

Event Timeline

jnuche changed the task status from Open to In Progress.Mar 7 2023, 2:52 PM

docker-gc needs a state file created by the docker-resource-access-monitor in order to run.

I could manually create an empty such state file or first run the monitor on deploy1002, but an empty state file means that the gc will pick images to delete in a non-defined way. So in the end I preferred to simply prune older images.

Disk usage for Docker images on deploy1002 is now below 20G:

[jnuche@deploy1002 ~]$ docker system df
TYPE                TOTAL               ACTIVE              SIZE                RECLAIMABLE
Images              108                 0                   19.12GB             19.12GB (100%)
Containers          0                   0                   0B                  0B
Local Volumes       0                   0                   0B                  0B
Build Cache         0                   0                   0B                  0B

I'm not understanding how this ticket has been resolved. I don't see any reference to puppet changes to ensure that docker-gc is installed/configured on the deploy servers.

Sorry, I misunderstood the intention of the task. I was actually going to propose a follow-up task to automate this.

Reopened and I will add the Puppet changes next.

Change 899718 had a related patch set uploaded (by Jaime Nuche; author: Jaime Nuche):

[operations/puppet@production] docker::gc: clean up older images from deployment servers using timer

https://gerrit.wikimedia.org/r/899718

Change 900313 had a related patch set uploaded (by Jaime Nuche; author: Jaime Nuche):

[operations/puppet@production] deployment_server: clean up older images using systemd timer

https://gerrit.wikimedia.org/r/900313

Change 899718 abandoned by Jaime Nuche:

[operations/puppet@production] docker::gc: clean up older images from deployment servers using timer

Reason:

Abandoned in favor of a different approach

https://gerrit.wikimedia.org/r/899718

Change 900313 merged by Clément Goubert:

[operations/puppet@production] deployment_server: clean up older images using systemd timer

https://gerrit.wikimedia.org/r/900313

cgoubert@deploy2002:~$ sudo docker images | wc -l
320
cgoubert@deploy2002:~$ sudo systemctl start docker-image-prune-old.service 
cgoubert@deploy2002:~$ sudo docker images | wc -l
238

In the end I opted to automate the cleanup of images using a systemd unit calling docker image prune.

The outcome of the the task then is:

  • Systemd unit reaps images older than 2 weeks on deployment servers daily
  • docker-gc repo revamped to generate+publish its docker images using Blubber and Kokkuri
  • Puppet configuration for docker-gc updated to use latest docker-gc images and corresponding features