Page MenuHomePhabricator

Pipeline image build cleanup
Closed, ResolvedPublic

Description

As we start doing builds we'll need a way to clean-up docker image layers that are no-longer needed.

Event Timeline

It looks like Docker now has various prune commands that should be useful for this case:

$ docker image prune -a

WARNING! This will remove all images without at least one container associated to them.
$ docker system prune -a

WARNING! This will remove:
	- all stopped containers
	- all volumes not used by at least one container
	- all networks not used by at least one container
	- all images without at least one container associated to them

By default, these commands will only delete "dangling" images which will not include tagged images such as the ones resulting from the build phase of the pipeline. I think using the -a option should include tagged images as well but we'd want to make sure it's not too aggressive and blows away cached based images on a regular basis (or maybe we don't care depending on how often this runs?). We can also use --filter 'until=' with a timestamp or duration (e.g. 8h) to limit prunes to only older images though I'm not sure how this interacts with -a.

By default, these commands will only delete "dangling" images which will not include tagged images such as the ones resulting from the build phase of the pipeline. I think using the -a option should include tagged images as well but we'd want to make sure it's not too aggressive and blows away cached based images on a regular basis (or maybe we don't care depending on how often this runs?).

One thing to be mindful of is that hosts where we are running blubber built test images may not necessarily be exclusively used for those images. Depending on how we setup the test portion of the pipeline, we may be sharing a host used for other docker CI stuff and we wouldn't want to blow away those images. Excluding images tagged wmfreleng/* would probably be enough to avoid clobbering those.

By default, these commands will only delete "dangling" images which will not include tagged images such as the ones resulting from the build phase of the pipeline. I think using the -a option should include tagged images as well but we'd want to make sure it's not too aggressive and blows away cached based images on a regular basis (or maybe we don't care depending on how often this runs?).

One thing to be mindful of is that hosts where we are running blubber built test images may not necessarily be exclusively used for those images. Depending on how we setup the test portion of the pipeline, we may be sharing a host used for other docker CI stuff and we wouldn't want to blow away those images. Excluding images tagged wmfreleng/* would probably be enough to avoid clobbering those.

Definitely. Anything tagged won't be deleted by docker image prune or docker system prune unless we give it the -a option.

After our chat yesterday in IRC, it seemed reasonable to do something like:

  1. Iterate over images that were built and tagged by the pipeline script or maybe just Blubber in general—we could have Blubber add some useful labels by default
  2. Remove all tags from each image if it's older than our cleanup threshold
  3. Run docker image prune --filter 'until=[threshold]' and let it delete the newly untagged images as well as other dangling images
thcipriani moved this task from In-progress to Backlog on the Release-Engineering-Team (Kanban) board.

Not currently working on this, but may pick it up again in near future.

The other day I used this nasty little Ruby one-liner to cleanup docker-pkg images that weren't latest. It wasn't perfect (Docker complained about trying to delete some images that were parents of others) but it's a start.

docker image ls --format "{{.ID}} {{.Tag}}" | ruby -e 'images = {}; while gets; f = $_.split; (images[f[0]] ||= []) << f[1]; end; puts images.reduce([]) { |r, (k, v)| v.include?("latest") ? r : r << k }' | xargs docker rmi

The other day I used this nasty little Ruby one-liner to cleanup docker-pkg images that weren't latest. It wasn't perfect (Docker complained about trying to delete some images that were parents of others) but it's a start.

docker image ls --format "{{.ID}} {{.Tag}}" | ruby -e 'images = {}; while gets; f = $_.split; (images[f[0]] ||= []) << f[1]; end; puts images.reduce([]) { |r, (k, v)| v.include?("latest") ? r : r << k }' | xargs docker rmi

This makes sense. docker-pkg should only look for the images that are present in the docker changlog files for the integration/config repository, so any image tagged with :latest should be the most recent version of the image. Images that are older, while they may still be referenced in jjb and in jenkins, can still be removed from contint1001 without impacting docker-pkg runs since they have all been pushed to the docker registry.

Blubber is more dependent on the docker cache, but it won't be for production images only for test images. For test images we could probably remove blubber images that are older than n where n ~= 2 weeks.

Change 490505 had a related patch set uploaded (by Dduvall; owner: Dduvall):
[integration/config@master] maintenance: Cleanup old Docker images at a lower threshold

https://gerrit.wikimedia.org/r/490505

Change 490505 had a related patch set uploaded (by Thcipriani; owner: Dduvall):
[integration/config@master] maintenance: Cleanup old Docker images at a lower threshold

https://gerrit.wikimedia.org/r/490505

Change 490505 merged by jenkins-bot:
[integration/config@master] maintenance: Cleanup old Docker images at a lower threshold

https://gerrit.wikimedia.org/r/490505

The maintenance job is up-to-date and should be cleaning up images on contint1001 as well as other jenkins agents.