Page MenuHomePhabricator

Fix how we keep docker-pkg based images up to date
Closed, ResolvedPublic

Description

Right now we ensure docker-pkg images are up to date when used in Blubber by ensuring:

a) we use no caching

b) we always pull the latest image.

Every week we rebuild all these base images with the --nightly switch of docker-pkg, it applies a new batch of apt updates and keeps the base images up to date.

On the other hand, it means when we rebuild child images to add new features, we rebuild based on the official changelog which might reference an older image that might or might not be broken, leading to nasty bugs like T344438 (gpg keys from the outdated base image had expired on the upstream repository which was fixed by forcing a rebuild).

I think there are three ways to solve this:

  1. We create a script that has permissions to commit to the production-images repo, and weekly runs docker-pkg update using the base images as starting point, then merges the changes. We will have very large changelogs eventually, but this will ensure consistency. Ideally, we move all this to CI when we move to Gitlab
  2. We modify docker-pkg so that it doesn't use the version in the changelog for the base image, but any version with a more recent nightly. This would remove some of the advantages of using something like docker-pkg
  3. We re-think docker-pkg as an application with a database where we keep track of dependencies and of which image tag was used to build the next one

I think by far the simplest and most effective way of solving this problem is the first option (rebuild images on a weekly basis to catch up with base images updates). I need to check with release engineering:

  • How hard would it be to run docker-pkg in Gitlab's CI
  • How hard would it be to grant a specific bot user the ability to push commits to the repository.

Event Timeline

Restricted Application added a subscriber: Aklapper. ยท View Herald TranscriptAug 18 2023, 6:27 AM
Joe triaged this task as Medium priority.Aug 18 2023, 6:28 AM

I slightly amended the task description. The thing I like with docker-pkg is that it effectively freeze the images parenting which gives some kind of control as to what is being rebuild/introduced in a new version of a child image, that helps avoiding unwanted side effects. Consider:

bullseye (SRE base image)
โ””โ”€โ”€ ci-bullseye (base scripts for legacy CI)
    โ””โ”€โ”€ php7.4 (php packages)
        โ””โ”€โ”€ composer-php74 (add composer)

I can then update the composer version without updating the php version which avoid side effects.

Then in a recent case I have updated scripts in an image which rightfully caused the rebuild of child images and some thus magically upgraded Debian package (notably chromium got magically upgraded which caused test failures). A while ago we had to update apt (which is in the base image) which forced us to rebuild all the images but in turn caused all packages to be upgraded and that caused issues in the test run, that is what I call unexpected upgrades.

I guess the issue is that the images depends on an external state (the list of Debian packages fetched from the apt repository) and the images are thus not reproducible (short of pointing to https://snapshot.debian.org/ to freeze the state).

I don't know about running docker-pkg inside Gitlab (docker in docker?), locally I have been running docker-pkg using Podman/Buildah and rootless containers and maybe Podman/Buildah can be run from within a Docker container. Then there are some slight differences between Docker and Podman which might be troublesome. There is Docker Buildkit which supports rootless build (via RootlessKit).

A side track I had thought for a while is to handle Docker layers like git commits and write some tooling that lets you rebase a series of layers on top of a base image (similar to rebasing a feature branch on top of upstream). So that eventually one could:

Initial state:

bullseye layer 13245 | tag 0.1
โ””โ”€โ”€ myapplication layer abcdef | tag 0.1

Rebuild the base image forging a new one:

bullseye layer 13245 | tag 0.1
โ””โ”€โ”€ myapplication layer abcdef | tag 3.5
bullseye layer 67890 | tag 0.2

Rebase myapplication layer abcdef on top of the new bullseye image 67890 but leave it otherwise unchanged:

bullseye layer 13245 | tag 0.1
โ””โ”€โ”€ myapplication layer abcdef | tag 3.5
bullseye layer 67890 | tag 0.2
โ””โ”€โ”€ myapplication layer abcdef | tag 3.6

Then the myapplication is unchanged (prevents unwanted side effects such as Debian package being magically updated or the building failing due to an external resource) but the resulting image receives the updated from the previous layer. The drawback though is that if the myapplication layer has a dependency upon one of the parent layer (such as a binary compile), the resulting rebased image is invalid (then I guess that is why in code we have CI running tests to ensure the rebased feature branch still works).

</braindump>

If your desire is having deterministic builds, it would be enough to NOT remove the apt archives from the base layer image, and then only run apt-get install at all other layers (unless you've added a new component), but I think this is a pretty specific need of CI images, to be honest. OS updates aren't usually problematic for anything else.

As for our future, I think I'll try to use buildkit like kokkuri does, although I will need to study that a bit more. After all, the newer versions of docker-pkg allow doing that relatively easily.

For now it would be enough for us to just get a gerrit account that we can use to:

  • submit and merge a change per week that adds a new changelog entry to all images in a repository as part of a weekly cron
  • build the production images after the change is merged

This is basically what scap backport and Trainbranchbot/Pipelinebot do.

Change 969303 had a related patch set uploaded (by Giuseppe Lavagetto; author: Giuseppe Lavagetto):

[operations/docker-images/production-images@master] Add weekly-update script

https://gerrit.wikimedia.org/r/969303

Change 970204 had a related patch set uploaded (by Giuseppe Lavagetto; author: Giuseppe Lavagetto):

[operations/deployment-charts@master] Add weekly-update script

https://gerrit.wikimedia.org/r/970204

Change 970204 abandoned by Giuseppe Lavagetto:

[operations/deployment-charts@master] Add weekly-update script

Reason:

Wrong repo, I'm dumb

https://gerrit.wikimedia.org/r/970204

Change 969303 merged by Giuseppe Lavagetto:

[operations/docker-images/production-images@master] Add weekly-update script

https://gerrit.wikimedia.org/r/969303

Change 970391 had a related patch set uploaded (by Giuseppe Lavagetto; author: Giuseppe Lavagetto):

[operations/puppet@production] docker::builder: add system to properly perform a weekly update

https://gerrit.wikimedia.org/r/970391

Change 970392 had a related patch set uploaded (by Giuseppe Lavagetto; author: Giuseppe Lavagetto):

[operations/puppet@production] docker::builder: switch systemd timer to our new script

https://gerrit.wikimedia.org/r/970392

Change 970391 merged by Giuseppe Lavagetto:

[operations/puppet@production] docker::builder: add system to properly perform a weekly update

https://gerrit.wikimedia.org/r/970391

Change 970392 merged by Giuseppe Lavagetto:

[operations/puppet@production] docker::builder: switch systemd timer to our new script

https://gerrit.wikimedia.org/r/970392

Joe claimed this task.