https://www.raftt.io/post/keeping-your-private-image-registry-clean.html
The link above explains very well the situation in which we are in: every time we delete images/tags via the remove API, nothing is really done behind the scenes. The catalog and Swift storage are not touched, until a proper garbage collection happens.
How can we garbage collect?
We have two options (at least, that we know of):
- Use the built-in garbage collection feature of Docker Distribution. It works in two steps, one that "marks" all the images/tags to delete, and the other one that "sweeps" and delete them. The mark step is very slow, and in our case it could probably take 2/3 days to complete. When you run GC, the Docker Registry needs to be in read-only mode, something that could affect daily deployments and builds.
- Use the Docker Distribution Pruner tool, that on paper should be way faster. The main issue is that it is marked as experimental, not production ready and its last commits are only related to supporting new golang versions (so active development of features and general reliability seems to have stopped). The tool can run in dry-run mode as well, so we could test it to see what it would do.
Proposed plan
- Run both tools in dry-run mode, and report back timings and what they would act on.
- Establish if any of the results is good/viable, and proceed with a real garbage collection.
- Think about how to streamline this, even if we'll start from "we do this action manually every 3 months".