Page MenuHomePhabricator

Upgrade K8s docker images running in Wikimedia production on Buster to either Bullseye or Bookworm
Closed, ResolvedPublic

Description

Hi folks!

Our dear Buster is not going to be supported soon by Debian, so we should upgrade to either Bullseye or Bookworm. Debmonitor is now able to report the Debian version-id for a lot of Docker images in our registry:
https://debmonitor.wikimedia.org/images/

Core images:

  • envoy
  • envoy-future
  • cfssl-issuer
  • coredns
  • helm-state-metrics
  • echoserver (not really but used for testing by folks etc..)

Mediawiki-related:

  • mcrouter
  • prometheus-mcrouter-exporter

Misc:

  • fluent-bit (Removed since not used anymore)
  • haproxy

Will add more as I review the Debmonitor's report in more details.

Details

Related Changes in Gerrit:
SubjectRepoBranchLines +/-
operations/deployment-chartsmaster+0 -2
operations/deployment-chartsmaster+8 -2
operations/puppetproduction+1 -1
operations/deployment-chartsmaster+3 -8
operations/deployment-chartsmaster+4 -0
operations/deployment-chartsmaster+4 -0
generated-data-platform/datasets/image-suggestionsmain+3 -2
operations/deployment-chartsmaster+6 -0
operations/puppetproduction+2 -2
operations/deployment-chartsmaster+6 -1
operations/deployment-chartsmaster+67 -2
operations/puppetproduction+1 -2
operations/deployment-chartsmaster+2 -0
operations/docker-images/production-imagesmaster+7 -1
operations/puppetproduction+1 -0
operations/deployment-chartsmaster+0 -3
operations/deployment-chartsmaster+1 -0
operations/deployment-chartsmaster+2 -2
operations/deployment-chartsmaster+10 -3
operations/deployment-chartsmaster+1 -0
operations/deployment-chartsmaster+1 -0
operations/docker-images/production-imagesmaster+19 -7
operations/docker-images/production-imagesmaster+10 -3
operations/docker-images/production-imagesmaster+7 -1
operations/docker-images/production-imagesmaster+7 -1
operations/docker-images/production-imagesmaster+7 -1
operations/docker-images/production-imagesmaster+14 -2
operations/docker-images/production-imagesmaster+7 -1
operations/docker-images/production-imagesmaster+14 -2
operations/docker-images/production-imagesmaster+8 -2
operations/software/cfssl-issuermain+1 -1
Show related patches Customize query in gerrit

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Change #1049825 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/software/cfssl-issuer@main] Makefile: use 'go install' instead of 'go get'

https://gerrit.wikimedia.org/r/1049825

Change #1049828 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/docker-images/production-images@master] echoserver: upgrade to Bookworm

https://gerrit.wikimedia.org/r/1049828

Change #1049825 merged by Elukey:

[operations/software/cfssl-issuer@main] Makefile: use 'go install' instead of 'go get'

https://gerrit.wikimedia.org/r/1049825

Change #1049838 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/docker-images/production-images@master] cfssl-issuer: upgrade to Bookworm

https://gerrit.wikimedia.org/r/1049838

Change #1049577 merged by Elukey:

[operations/docker-images/production-images@master] coredns: upgrade to Bookworm

https://gerrit.wikimedia.org/r/1049577

Change #1049578 merged by Elukey:

[operations/docker-images/production-images@master] envoy: upgrade to Bookworm

https://gerrit.wikimedia.org/r/1049578

Change #1049586 merged by Elukey:

[operations/docker-images/production-images@master] helm-state-metrics: upgrade to Bookworm

https://gerrit.wikimedia.org/r/1049586

Change #1049588 merged by Elukey:

[operations/docker-images/production-images@master] prometheus-exporters: upgrade mcrouter and statsd to Bookworm

https://gerrit.wikimedia.org/r/1049588

Change #1049590 merged by Elukey:

[operations/docker-images/production-images@master] service-checker: upgrade to Bookworm

https://gerrit.wikimedia.org/r/1049590

Change #1049591 merged by Elukey:

[operations/docker-images/production-images@master] nutcracker: upgrade to Bookworm

https://gerrit.wikimedia.org/r/1049591

Change #1049828 merged by Elukey:

[operations/docker-images/production-images@master] echoserver: upgrade to Bookworm

https://gerrit.wikimedia.org/r/1049828

Change #1049838 merged by Elukey:

[operations/docker-images/production-images@master] cfssl-issuer: upgrade to Bookworm

https://gerrit.wikimedia.org/r/1049838

Change #1050568 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/deployment-charts@master] admin_ng: upgrade coredns to 1.8.7-2

https://gerrit.wikimedia.org/r/1050568

Change #1050569 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/deployment-charts@master] admin_ng: upgrade cfssl-issuer's Docker image

https://gerrit.wikimedia.org/r/1050569

Change #1050570 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/deployment-charts@master] api,rest-gateway: upgrade Envoy version

https://gerrit.wikimedia.org/r/1050570

Change #1050571 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/deployment-charts@master] admin_ng: update helm-state-metrics' Docker image version

https://gerrit.wikimedia.org/r/1050571

Change #1049587 merged by Elukey:

[operations/docker-images/production-images@master] mcrouter: upgrade to Bookworm

https://gerrit.wikimedia.org/r/1049587

Change #1050568 merged by jenkins-bot:

[operations/deployment-charts@master] admin_ng: upgrade coredns to 1.8.7-2

https://gerrit.wikimedia.org/r/1050568

Change #1050569 merged by Elukey:

[operations/deployment-charts@master] admin_ng: upgrade cfssl-issuer's Docker image

https://gerrit.wikimedia.org/r/1050569

Change #1051111 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[operations/deployment-charts@master] cfssl-issuer: Add container securityContext

https://gerrit.wikimedia.org/r/1051111

Change #1051111 merged by jenkins-bot:

[operations/deployment-charts@master] cfssl-issuer: Add container securityContext

https://gerrit.wikimedia.org/r/1051111

Change #1051132 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/deployment-charts@master] admin_ng: remove coredns image tag override for ml-staging-codfw

https://gerrit.wikimedia.org/r/1051132

Change #1051132 merged by Elukey:

[operations/deployment-charts@master] admin_ng: remove coredns image tag override for ml-staging-codfw

https://gerrit.wikimedia.org/r/1051132

Change #1050570 merged by Elukey:

[operations/deployment-charts@master] api,rest-gateway: upgrade Envoy version

https://gerrit.wikimedia.org/r/1050570

Change #1050571 merged by Elukey:

[operations/deployment-charts@master] admin_ng: update helm-state-metrics' Docker image version

https://gerrit.wikimedia.org/r/1050571

elukey triaged this task as Medium priority.

Built and rolled out the images listed in the description to staging envs. The next step is to roll them out to production (all clusters).

Caveats:

  • the envoy image is spread to a ton of containers since we use it as mesh sidecar, so rolling out the change will be interesting :D (maybe we can just let next deployments to pick it up over time).

Change #1051402 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/docker-images/production-images@master] wmfdebug: Upgrade to Bookworm

https://gerrit.wikimedia.org/r/1051402

Change #1051740 had a related patch set uploaded (by Effie Mouzeli; author: Effie Mouzeli):

[operations/deployment-charts@master] mw-mcouter: use bookworm images

https://gerrit.wikimedia.org/r/1051740

Change #1052080 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/puppet@production] role::builder: add mcrouter uid for docker-pkg

https://gerrit.wikimedia.org/r/1052080

Change #1052080 merged by Elukey:

[operations/puppet@production] role::builder: add mcrouter uid for docker-pkg

https://gerrit.wikimedia.org/r/1052080

From the mcrouter side of things, we hope to have T346690 sorted soon, which will mean that mw-* pods will not have a mcrouter container, but they will be using the mw-mrouter daemonset. In other words, rollout and deployment will be faster.

Unless anything unexpected comes up, we may roll out the bookworm images sometime next week or the week after.

Change #1051402 merged by Elukey:

[operations/docker-images/production-images@master] wmfdebug: Upgrade to Bookworm

https://gerrit.wikimedia.org/r/1051402

Change #1052263 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/deployment-charts@master] services: upgrade mesh's envoy Docker version

https://gerrit.wikimedia.org/r/1052263

Change #1052263 merged by jenkins-bot:

[operations/deployment-charts@master] services: upgrade mesh's envoy Docker version

https://gerrit.wikimedia.org/r/1052263

Change #1052691 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/puppet@production] role::deployment_server::kubernetes: update Envoy's version

https://gerrit.wikimedia.org/r/1052691

Change #1052691 merged by Elukey:

[operations/puppet@production] role::deployment_server::kubernetes: update Envoy's version

https://gerrit.wikimedia.org/r/1052691

In T368523 we’re seeing an “unable to get local issuer certificate” error that may or may not be related to the new Envoy version; it’s not very urgent (only affects a test wiki) but I’d be very thankful if someone could take a look :)

Edit: Further investigation shows the failure is not related to envoy after all.

Change #1054367 had a related patch set uploaded (by Effie Mouzeli; author: Effie Mouzeli):

[operations/deployment-charts@master] mcrouter: test bookworm image on mw-debug

https://gerrit.wikimedia.org/r/1054367

Change #1054368 had a related patch set uploaded (by Effie Mouzeli; author: Effie Mouzeli):

[operations/deployment-charts@master] mcrouter: test bookworm image on mw-debug

https://gerrit.wikimedia.org/r/1054368

Change #1054367 merged by jenkins-bot:

[operations/deployment-charts@master] mcrouter: test bookworm image on mw-debug

https://gerrit.wikimedia.org/r/1054367

Change #1054368 merged by jenkins-bot:

[operations/deployment-charts@master] mcrouter: test bookworm image on mw-api-int

https://gerrit.wikimedia.org/r/1054368

Change #1054507 had a related patch set uploaded (by Effie Mouzeli; author: Effie Mouzeli):

[operations/puppet@production] kubernetes: update mcrouter images to bookworm

https://gerrit.wikimedia.org/r/1054507

Change #1054511 had a related patch set uploaded (by Effie Mouzeli; author: Effie Mouzeli):

[operations/deployment-charts@master] mw-mcrouter: use bookworm images

https://gerrit.wikimedia.org/r/1054511

Change #1054511 merged by jenkins-bot:

[operations/deployment-charts@master] mw-mcrouter: use bookworm images

https://gerrit.wikimedia.org/r/1054511

Change #1054507 merged by Effie Mouzeli:

[operations/puppet@production] kubernetes: update mcrouter images to bookworm

https://gerrit.wikimedia.org/r/1054507

Jdforrester-WMF renamed this task from Upgrade K8s docker images to running in production on Buster with either Bullseye or Bookworm to Upgrade K8s docker images running in Wikimedia production on Buster to either Bullseye or Bookworm.Jul 17 2024, 7:19 PM

I used this horrible bash script to get a breakdown of image versions deployed on a given cluster:

for ns in `kubectl get ns | cut -d " " -f 1 | grep -v NAME`; do echo -e "\nnamespace: $ns\n"; kubectl get pods -n $ns -o jsonpath="{.items[*].spec['initContainers', 'containers'][*].image}" |tr -s '[[:space:]]' '\n' |sort |uniq -c; done

This is still painful but keeping a note in here anyway :)

I used this horrible bash script to get a breakdown of image versions deployed on a given cluster:

for ns in `kubectl get ns | cut -d " " -f 1 | grep -v NAME`; do echo -e "\nnamespace: $ns\n"; kubectl get pods -n $ns -o jsonpath="{.items[*].spec['initContainers', 'containers'][*].image}" |tr -s '[[:space:]]' '\n' |sort |uniq -c; done

This is still painful but keeping a note in here anyway :)

Nice! I keep a collection of horrors like that one at https://wikitech.wikimedia.org/wiki/Kubernetes/Kubectl/Cheat_Sheet please feel free to extend! :)

Change #1051740 abandoned by Effie Mouzeli:

[operations/deployment-charts@master] mw-mcouter: use bookworm images

Reason:

already done

https://gerrit.wikimedia.org/r/1051740

Change #1058588 had a related patch set uploaded (by Elukey; author: Elukey):

[generated-data-platform/datasets/image-suggestions@main] blubber: update build syntax and use Bookworm and Golang 1.21

https://gerrit.wikimedia.org/r/1058588

From docker report (k8s images) set to work only with Bullseye+ images:

Jul 29 15:37:59 build2001 docker-report-k8s[4134263]: 2024-07-29 15:37:59,515 WARNING[docker-report] Unable to create a report for docker-registry.wikimedia.org/wikimedia/blubber-buildkit:v0.12.0. The image is not supported.
Jul 29 15:48:35 build2001 docker-report-k8s[4134263]: 2024-07-29 15:48:35,508 WARNING[docker-report] Unable to create a report for docker-registry.wikimedia.org/wikimedia/generated-data-platform-datasets-image-suggestions:stable. The image is not supported.
Jul 29 16:39:50 build2001 docker-report-k8s[4134263]: 2024-07-29 16:39:50,697 WARNING[docker-report] Unable to create a report for docker-registry.wikimedia.org/wikimedia/mediawiki-libs-shellbox:video. The image is not supported.
Jul 29 16:39:51 build2001 docker-report-k8s[4134263]: 2024-07-29 16:39:51,181 WARNING[docker-report] Unable to create a report for docker-registry.wikimedia.org/wikimedia/mediawiki-multiversion:protoprod. The image is not supported.
Jul 29 16:45:51 build2001 docker-report-k8s[4134263]: 2024-07-29 16:45:51,655 WARNING[docker-report] Unable to create a report for docker-registry.wikimedia.org/wikimedia/mediawiki-services-geoshapes:2021-03-04-093059-publish. The image is not supported.
Jul 29 16:46:30 build2001 docker-report-k8s[4134263]: 2024-07-29 16:46:30,345 WARNING[docker-report] Unable to create a report for docker-registry.wikimedia.org/wikimedia/mediawiki-services-kartotherian:kartotherian. The image is not supported.
Jul 29 16:57:27 build2001 docker-report-k8s[4134263]: 2024-07-29 16:57:27,261 WARNING[docker-report] Unable to create a report for docker-registry.wikimedia.org/wikimedia/mediawiki-webserver:production. The image is not supported.
Jul 29 17:01:18 build2001 docker-report-k8s[4134263]: 2024-07-29 17:01:18,096 WARNING[docker-report] Unable to create a report for docker-registry.wikimedia.org/wikimedia/research-mwaddlink:test. The image is not supported.
Jul 29 17:03:55 build2001 docker-report-k8s[4134263]: 2024-07-29 17:03:55,491 WARNING[docker-report] Unable to create a report for docker-registry.wikimedia.org/wikimedia/wikimedia-portals:2024-07-29-122629-production. The image is not supported.

Change #1058588 merged by jenkins-bot:

[generated-data-platform/datasets/image-suggestions@main] blubber: update build syntax and use Bookworm and Golang 1.21

https://gerrit.wikimedia.org/r/1058588

Change #1063032 had a related patch set uploaded (by Scott French; author: Scott French):

[operations/deployment-charts@master] mw-debug: pilot bookworm statsd exporter image

https://gerrit.wikimedia.org/r/1063032

Change #1063033 had a related patch set uploaded (by Scott French; author: Scott French):

[operations/deployment-charts@master] mw-api-int: pilot bookworm statsd exporter image

https://gerrit.wikimedia.org/r/1063033

Change #1063034 had a related patch set uploaded (by Scott French; author: Scott French):

[operations/deployment-charts@master] mediawiki: upgrade all statsd exporters to bookworm image

https://gerrit.wikimedia.org/r/1063034

Change #1063032 merged by jenkins-bot:

[operations/deployment-charts@master] mw-debug: pilot bookworm statsd exporter image

https://gerrit.wikimedia.org/r/1063032

thcipriani subscribed.

Noting here for visibility:

I'm unsure if deploy happens frequently enough across all our services to ensure that these are picked up in a timely way.

Change #1063033 merged by jenkins-bot:

[operations/deployment-charts@master] mw-api-int: pilot bookworm statsd exporter image

https://gerrit.wikimedia.org/r/1063033

Change #1063034 merged by jenkins-bot:

[operations/deployment-charts@master] mediawiki: upgrade all statsd exporters to bookworm image

https://gerrit.wikimedia.org/r/1063034

Mentioned in SAL (#wikimedia-operations) [2024-08-20T17:51:04Z] <swfrench-wmf> mediawiki statsd exporter deployments upgraded to bookworm-based image - T368366

Change #1068004 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/puppet@production] role::deployment_server::kubernetes: upgrade nutcracker

https://gerrit.wikimedia.org/r/1068004

Change #1068004 merged by Elukey:

[operations/puppet@production] role::deployment_server::kubernetes: upgrade nutcracker

https://gerrit.wikimedia.org/r/1068004

Most of the images have been migrated, we have some subtasks still open but they'll take time to complete.

Change #1191203 had a related patch set uploaded (by RLazarus; author: RLazarus):

[operations/deployment-charts@master] wikifeeds: Remove envoy image_version override

https://gerrit.wikimedia.org/r/1191203

Change #1191203 merged by jenkins-bot:

[operations/deployment-charts@master] wikifeeds: Remove envoy image_version override

https://gerrit.wikimedia.org/r/1191203