Page MenuHomePhabricator

Some toolforge components are running an old version
Closed, ResolvedPublic

Description

While upgrading the components for 1.29, I noticed that toolforge_get_versions.sh shows some outdated charts both in tools and toolsbeta:

fnegri@tools-bastion-12:~/toolforge-deploy$ ./utils/toolforge_get_versions.sh
| component | type | package name | version | comment |
| :-------: | :--: | :----------: | :-----: | :-----: |
| api-gateway | chart | api-gateway | api-gateway-0.0.64-20250303171648-bd834b88 |  |
| builds-api | chart | builds-api | builds-api-0.0.180-20250310171252-3a5bd08b |  |
| builds-builder | chart | builds-builder | builds-builder-0.0.121-20241003135043-98c2199c | toolforge-deploy has builds-builder-0.0.127-20250310170709-62faa3b3 |
| builds-cli | package | toolforge-builds-cli | 0.0.19 |  |
| calico | chart | calico | calico-0.0.14-20240920191533-f94f2f8d | toolforge-deploy has calico-0.0.15-20241104101859-e2c4ee9b |
| cert-manager | chart | cert-manager | cert-manager-v1.15.3 |  |
| components-api | chart | components-api | components-api-0.0.19-20241001092739-d389755b | toolforge-deploy has components-api-0.0.82-20250306090549-677c59f7 |
| components-cli | package | toolforge-components-cli | missing | |
| envvars-admission | chart | envvars-admission | envvars-admission-0.0.25-20250304093736-6bb51ac3 |  |
| envvars-api | chart | envvars-api | envvars-api-0.0.65-20250303153943-68878700 |  |
| envvars-cli | package | toolforge-envvars-cli | 0.0.12 |  |
| image-config | chart | image-config | image-config-0.0.20-20240209102849-75f6a5f8 |  |
| ingress-admission | chart | ingress-admission | ingress-admission-0.0.57-20250304210417-f60db3c7 |  |
| ingress-nginx-gen2 | chart | ingress-nginx-gen2 | ingress-nginx-4.11.2 |  |
| jobs-api | chart | jobs-api | jobs-api-0.0.359-20250311172027-e7e0ea6e |  |
| jobs-cli | package | toolforge-jobs-framework-cli | 16.1.8 |  |
| jobs-emailer | chart | jobs-emailer | jobs-emailer-0.0.54-20250304203340-d628ac53 |  |
| kyverno | chart | kyverno | kyverno-3.2.6 |  |
| maintain-harbor | chart | maintain-harbor | maintain-harbor-0.0.20-20250310171918-609fd0a4 |  |
| maintain-kubeusers | chart | maintain-kubeusers | maintain-kubeusers-0.0.171-20241105173021-bf5186a3 |  |
| registry-admission | chart | registry-admission | registry-admission-0.0.57-20250306095410-d4fed2c9 |  |
| toolforge-cli | package | toolforge-cli | 0.3.5 |  |
| toolforge-weld | package | python3-toolforge-weld | 1.6.8 |  |
| tools-webservice | package | toolforge-webservice | 0.103.15 |  |
| volume-admission | chart | volume-admission | volume-admission-0.0.63-20250311102045-0ca3b79d |  |
| wmcs-k8s-metrics | chart | wmcs-metrics | wmcs-k8s-metrics-0.0.20-20240628101504-9ed20c1f | toolforge-deploy has wmcs-k8s-metrics-0.0.21-20241104102245-a6f60a0d |

@dcaro found this is an issue caused by our ./deploy.sh script using helmfile apply instead of helmfile sync. helmfile apply only updates the installed chart if it finds a diff in the k8s resources.

Event Timeline

fnegri changed the task status from Open to In Progress.Mar 13 2025, 2:46 PM
fnegri moved this task from Next Up to In Progress on the Toolforge (Toolforge iteration 18) board.

I replaced helmfile apply with helmfile diff + helmfile sync in https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/712

I could then sync the versions in toolsbeta by running ./deploy.sh <component> for all the outdated components.

In tools, ./deploy.sh builds-builder failed with:

Upgrading release=builds-builder, chart=/tmp/helmfile3218089280/builds-builder/builds-builder/builds-builder/0.0.127-20250310170709-62faa3b3/builds-builder, namespace=builds-builder

FAILED RELEASES:
NAME             NAMESPACE        CHART                  VERSION   DURATION
builds-builder   builds-builder   tools/builds-builder                  11s

in ./helmfile.yaml: failed processing release builds-builder: command "/usr/sbin/helm" exited with non-zero status:

PATH:
  /usr/sbin/helm

ARGS:
  0: helm (4 bytes)
  1: upgrade (7 bytes)
  2: --install (9 bytes)
  3: builds-builder (14 bytes)
  4: /tmp/helmfile3218089280/builds-builder/builds-builder/builds-builder/0.0.127-20250310170709-62faa3b3/builds-builder (115 bytes)
  5: --version (9 bytes)
  6: 0.0.127-20250310170709-62faa3b3 (31 bytes)
  7: --create-namespace (18 bytes)
  8: --namespace (11 bytes)
  9: builds-builder (14 bytes)
  10: --values (8 bytes)
  11: /tmp/helmfile1420318672/builds-builder-builds-builder-values-f487bcb89 (70 bytes)
  12: --reset-values (14 bytes)
  13: --history-max (13 bytes)
  14: 10 (2 bytes)

ERROR:
  exit status 1

EXIT STATUS
  1

STDERR:
  Error: UPGRADE FAILED: cannot patch "toolforge-buildpacks-phases" with kind Task: admission webhook "webhook.pipeline.tekton.dev" denied the request: mutation failed: cannot decode incoming new object: json: unknown field "name"

COMBINED OUTPUT:
  Error: UPGRADE FAILED: cannot patch "toolforge-buildpacks-phases" with kind Task: admission webhook "webhook.pipeline.tekton.dev" denied the request: mutation failed: cannot decode incoming new object: json: unknown field "name"

The fix for the above was to (for future reference):

  • Get a backup of the validatingwebhookconfiguration and mutatingwebhookconfiguration for tekton
  • Get a rendered version of the new task (using helm to generate it, or manually populating the fields)
    • helm pull oci://tools-harbor.wmcloud.org/toolforge/builds-builder --destination $PWD/helmchart --untar --version 0.0.127-20250310170709-62faa3b3 <- got the version from running the ./deploy.sh script adding --debug to helmfile to show the commands it runs
    • copy the values from the toolforge-deploy/component/builds-builder/values/tools.yaml.gotmpl into the chart values.yaml (there were only auth stuff)
    • helm template --output-dir ./render
    • find the rendere task inside there
  • Update the task
  • Restore the webhooks
  • Do a deploy with deploy.sh
fnegri moved this task from In Progress to Done on the Toolforge (Toolforge iteration 18) board.
fnegri moved this task from In progress to Done on the cloud-services-team (FY2024/2025-Q3-Q4) board.