Page MenuHomePhabricator

[tbs] fix/improve the updating of the buildpack/tekton images in the local repo
Closed, ResolvedPublic5 Estimated Story Points

Description

Currently we have a cookbook wmcs.toolforge.buildservice.upload_images_to_repo that updates the images of lifecycle,
builder and tekton related to the given tags, but there's the distroless/base one that does not use a tag, but sha,
for which we need back the sha of the uploaded image before we can create the commit in the buildservice repo to pull
that sha, but that only happens after pushing the image.

So this task is to somehow give back a summary of what is it that you have to patch in the bulidservice repo to get the
newly updated images working.

Event Timeline

Hello @dcaro , playing around a bit with the cookbook cookbooks/wmcs/toolforge/buildservice/upload_images_to_repo.py with @fnegri , we found out that the hash of the updated distroless image is already being printed to the command line, and the file buildservice/deploy/base-tekton/tekton-pipelines-controller-patch.json is already embedding the docker-registry.tools.wmflabs.org/toolforge-distroless-base@sha256:eebb155bd1116e3b67e2ce43244f9c9958df0cbb75a84c231565fae2ed87c9f4 image. To update, we only need to change the hash of the image above to the current value. This task is about improving this process, but it's unclear what improvement means in this case.

  1. Are we to research and find a way to make it possible that we no longer need to manually change the hash in the buildservice repo?
  2. Or are we to make make changes to the cookbook to somehow make it clearer that the hash being printed to the commandline for the docker push command should be copied and manually added to the buildservice repo?

Hello @dcaro , playing around a bit with the cookbook cookbooks/wmcs/toolforge/buildservice/upload_images_to_repo.py with @fnegri , we found out that the hash of the updated distroless image is already being printed to the command line, and the file buildservice/deploy/base-tekton/tekton-pipelines-controller-patch.json is already embedding the docker-registry.tools.wmflabs.org/toolforge-distroless-base@sha256:eebb155bd1116e3b67e2ce43244f9c9958df0cbb75a84c231565fae2ed87c9f4 image. To update, we only need to change the hash of the image above to the current value. This task is about improving this process, but it's unclear what improvement means in this case.

  1. Are we to research and find a way to make it possible that we no longer need to manually change the hash in the buildservice repo?

This is out of the original scope of the task, but if you want to try to tackle it you are welcome to :), it might get tricky as we don't really have yet anything like that.

  1. Or are we to make make changes to the cookbook to somehow make it clearer that the hash being printed to the commandline for the docker push command should be copied and manually added to the buildservice repo?

This was the original scope yep, maybe just printing a message like:

Don't forget to update the file `buildservice/deploy/base-tekton/tekton-pipelines-controller-patch.json` in the bulidservice repo with the contents:
----
<...>
docker-registry.tools.wmflabs.org/toolforge-distroless-base@sha256:<new_hash>
<...>
----

or similar

Ok I think it's clear now, thank you!

Change 859582 had a related patch set uploaded (by Raymond Ndibe; author: Raymond Ndibe):

[operations/cookbooks@wmcs] cookbooks: print out instructions on next step after updating the buildpack/tekton images in the local repo

https://gerrit.wikimedia.org/r/859582

Change 859582 merged by jenkins-bot:

[operations/cookbooks@wmcs] cookbooks: print out instructions on next step after updating the buildpack/tekton images in the local repository

https://gerrit.wikimedia.org/r/859582

aborrero added subscribers: taavi, aborrero.

Reopening, since we found a problem with this workflow today and this probably worth some additional discussion.

The current deployment manifest at https://gitlab.wikimedia.org/repos/cloud/toolforge/buildservice/-/blob/main/deploy/base-tekton/tekton-pipelines-controller-patch.json#L45 points to
docker-registry.tools.wmflabs.org/toolforge-distroless-base@sha256:eebb155bd1116e3b67e2ce43244f9c9958df0cbb75a84c231565fae2ed87c9f4 .

This is what it is deployed on k8s at the moment (so, no config drift):

aborrero@toolsbeta-test-k8s-control-4:~$ sudo -i kubectl -n tekton-pipelines get deployment.apps/tekton-pipelines-controller -o yaml | grep distroless
        - docker-registry.tools.wmflabs.org/toolforge-distroless-base@sha256:eebb155bd1116e3b67e2ce43244f9c9958df0cbb75a84c231565fae2ed87c9f4
aborrero@tools-k8s-control-4:~$ sudo -i kubectl -n tekton-pipelines get deployment.apps/tekton-pipelines-controller -o yaml | grep distroless
        - docker-registry.tools.wmflabs.org/toolforge-distroless-base@sha256:eebb155bd1116e3b67e2ce43244f9c9958df0cbb75a84c231565fae2ed87c9f4

However, this image is not present on the docker registry after a cleanup that happened a few days ago by @taavi :

2023-07-10 20:39 taavi: freeing up disk space usage on tools docker-registry with `taavi@tools-docker-registry-05:~$ sudo sudo -u docker-registry docker-registry garbage-collect /etc/docker/registry/config.yml --delete-untagged`

A new buildservice deployment from scratch (for example, using lima-kilo) will fail to execute the build pipeline:

Warning  Failed     12s (x2 over 27s)  kubelet            Failed to pull image "docker-registry.tools.wmflabs.org/toolforge-distroless-base@sha256:eebb155bd1116e3b67e2ce43244f9c9958df0cbb75a84c231565fae2ed87c9f4": rpc error: code = NotFound desc = failed to pull and unpack image "docker-registry.tools.wmflabs.org/toolforge-distroless-base@sha256:eebb155bd1116e3b67e2ce43244f9c9958df0cbb75a84c231565fae2ed87c9f4": failed to resolve reference "docker-registry.tools.wmflabs.org/toolforge-distroless-base@sha256:eebb155bd1116e3b67e2ce43244f9c9958df0cbb75a84c231565fae2ed87c9f4": docker-registry.tools.wmflabs.org/toolforge-distroless-base@sha256:eebb155bd1116e3b67e2ce43244f9c9958df0cbb75a84c231565fae2ed87c9f4: not found

However, the live tools/toolsbeta deployments are (miraculously fine) because the referenced image is cached locally in each worker node:

Normal  Pulled     61s   kubelet            Container image "docker-registry.tools.wmflabs.org/toolforge-distroless-base@sha256:eebb155bd1116e3b67e2ce43244f9c9958df0cbb75a84c231565fae2ed87c9f4" already present on machine

I don't know yet how to improve this workflow, but perhaps tag the distroless-base image even if such tag is not used by the deployment manifest.

Change 939673 had a related patch set uploaded (by Arturo Borrero Gonzalez; author: Arturo Borrero Gonzalez):

[cloud/wmcs-cookbooks@main] toolforge: buildservice: upload_images_to_repo: tag distroless-base image

https://gerrit.wikimedia.org/r/939673

The registry only contains sha256:2d4d28e45bbe4e38177fd4fdc922dbfaf95e607b06bbc4187a90410d895b4491 with sha256:eebb155bd1116e3b67e2ce43244f9c9958df0cbb75a84c231565fae2ed87c9f4 having being flushed by the registry cleanup maintenance.

This may indicate a missing refresh on the deployment after running the cookbook.

So there is a drift after all, between what's in the registry and what's in both git and deployed.

This could be solved using a tag, maybe even :latest in the deployment manifest?

Change 939673 abandoned by Arturo Borrero Gonzalez:

[cloud/wmcs-cookbooks@main] toolforge: buildservice: upload_images_to_repo: tag distroless-base image

Reason:

:latest is the default and it might be enough

https://gerrit.wikimedia.org/r/939673

Mentioned in SAL (#wikimedia-cloud) [2023-07-19T15:09:14Z] <arturo> try to rescue docker-registry.tools.wmflabs.org/toolforge-distroless-base@sha256:eebb155bd1116e3b67e2ce43244f9c9958df0cbb75a84c231565fae2ed87c9f4 back into the registry from a k8s worker local cache (T321188)

Change 939728 had a related patch set uploaded (by Arturo Borrero Gonzalez; author: Arturo Borrero Gonzalez):

[cloud/wmcs-cookbooks@main] toolforge: buildservice: upload_images_to_repo: remove misleading comment

https://gerrit.wikimedia.org/r/939728

Mentioned in SAL (#wikimedia-cloud) [2023-07-19T15:37:34Z] <arturo> root@tools-docker-registry-05:~# curl -sS -X DELETE localhost:5000/v2/toolforge-distroless-base/manifests/sha256:2d4d28e45bbe4e38177fd4fdc922dbfaf95e607b06bbc4187a90410d895b4491 (T321188)

Mentioned in SAL (#wikimedia-cloud) [2023-07-19T15:38:06Z] <arturo> root@tools-docker-registry-05:~# docker-registry garbage-collect /etc/docker/registry/config.yml (T321188)

Mentioned in SAL (#wikimedia-cloud-feed) [2023-07-19T16:05:08Z] <wm-bot2> updating docker-registry.tools.wmflabs.org/toolforge-distroless-base@sha256:c11cf17ee8a54dd3a44908ed3f38ffbfb41f1c8c6a2264de9b3e2f5ef4576006 (T321188) - cookbook ran by arturo@nostromo

Mentioned in SAL (#wikimedia-cloud-feed) [2023-07-19T16:30:07Z] <wm-bot2> updating docker-registry.tools.wmflabs.org/toolforge-distroless-base@sha256:eebb155bd1116e3b67e2ce43244f9c9958df0cbb75a84c231565fae2ed87c9f4 (T321188) - cookbook ran by arturo@nostromo

Mentioned in SAL (#wikimedia-cloud-feed) [2023-07-19T16:34:37Z] <wm-bot2> updating docker-registry.tools.wmflabs.org/toolforge-distroless-base@sha256:77051c1e40d180d0695b5a9ba7a15161ecac7220ea8c1ed6721bd1c8329b1b2f (T321188) - cookbook ran by arturo@nostromo

Change 939746 had a related patch set uploaded (by Arturo Borrero Gonzalez; author: Arturo Borrero Gonzalez):

[cloud/wmcs-cookbooks@main] toolforge: buildservice: upload_images_to_repo: use explicit tags

https://gerrit.wikimedia.org/r/939746

Change 939746 merged by Arturo Borrero Gonzalez:

[cloud/wmcs-cookbooks@main] toolforge: buildservice: upload_images_to_repo: use explicit tags

https://gerrit.wikimedia.org/r/939746

Mentioned in SAL (#wikimedia-cloud-feed) [2023-07-20T11:25:45Z] <wm-bot2> updating docker-registry.tools.wmflabs.org/toolforge-distroless-base:latest (T321188) - cookbook ran by arturo@endurance

Mentioned in SAL (#wikimedia-cloud-feed) [2023-07-20T11:27:19Z] <wm-bot2> updating docker-registry.tools.wmflabs.org/toolforge-distroless-base:debug (T321188) - cookbook ran by arturo@endurance

Mentioned in SAL (#wikimedia-cloud-feed) [2023-07-20T13:09:11Z] <wm-bot2> updating docker-registry.tools.wmflabs.org/toolforge-distroless-base-debug:latest (T321188) - cookbook ran by arturo@nostromo

Testing in toolsbeta showed some registry pull problems T342338: toolforge docker registry: unable to pull docker-registry.tools.wmflabs.org/toolforge-distroless-base:latest, which forced me to rename the image and add the -debug suffix to try to workaround it.

Final notes:

I have detected that more modern tekton versions change this image yet again, so we should do a careful evaluation when doing a future migration.