Page MenuHomePhabricator

Fix alternatives entries in helm and kubernetes-client packages
Closed, ResolvedPublic

Description

Both the helm and kubernetes-client packages use dependent alternative entries that should be grouped.

From https://gerrit.wikimedia.org/r/c/integration/config/+/1123363

Note the Debian package defines two alternatives: helm and helm3 which really should have been made a link group using --slave:

update-alternatives \
 --install /usr/bin/helm helm /usr/bin/helm311 \
  --slave /usr/bin/helm3 helm3 /usr/bin/helm311

In kubernetes-client, the binary and bash-completion alternative entries should be grouped.

Event Timeline

Change #1125377 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[operations/puppet@production] helm: Install helm 3.11 and 3.17 in parallel

https://gerrit.wikimedia.org/r/1125377

Change #1125377 merged by JMeybohm:

[operations/puppet@production] helm: Install helm 3.11 and 3.17 in parallel

https://gerrit.wikimedia.org/r/1125377

Change #1135412 had a related patch set uploaded (by Jelto; author: Jelto):

[operations/debs/helm3@helm311] make helm3 alternative entry dependent on helm

https://gerrit.wikimedia.org/r/1135412

Change #1135412 merged by Jelto:

[operations/debs/helm3@helm311] make helm3 alternative entry dependent on helm

https://gerrit.wikimedia.org/r/1135412

Mentioned in SAL (#wikimedia-operations) [2025-04-16T13:51:58Z] <jelto> "Imported helm311 3.11.3-4 to bullseye-wikimedia and bookworm-wikimedia - T387548"

Change #1137010 had a related patch set uploaded (by Jelto; author: Jelto):

[operations/debs/helm3@master] make helm3 alternative entry dependent on helm

https://gerrit.wikimedia.org/r/1137010

Change #1137010 merged by Jelto:

[operations/debs/helm3@master] make helm3 alternative entry dependent on helm

https://gerrit.wikimedia.org/r/1137010

Mentioned in SAL (#wikimedia-operations) [2025-04-30T11:17:09Z] <jelto> "Imported helm317 3.17.0-2 to bullseye-wikimedia and bookworm-wikimedia - T387548"

With the new versions of helm311 and helm317 this should be fixed now. I tested this locally:

jelto-wmf@x1:~$ sudo apt-get install helm311 helm317
...
Setting up helm311 (3.11.3-4) ...
Setting up helm317 (3.17.0-2) ...
...
jelto-wmf@x1:~$ helm version
version.BuildInfo{Version:"v3.11.3", GitCommit:"", GitTreeState:"", GoVersion:"go1.23.5"}
jelto-wmf@x1:~$ helm3 version
version.BuildInfo{Version:"v3.11.3", GitCommit:"", GitTreeState:"", GoVersion:"go1.23.5"}
jelto-wmf@x1:~$

I'll upload changes to remove the workaround from puppet and the container images.

Change #1140164 had a related patch set uploaded (by Jelto; author: Jelto):

[operations/puppet@production] helm: remove duplicate alternatives::select entry

https://gerrit.wikimedia.org/r/1140164

Change #1140168 had a related patch set uploaded (by Jelto; author: Jelto):

[integration/config@master] helm-linter: Remove duplicate update-alternatives for helm3

https://gerrit.wikimedia.org/r/1140168

Change #1140168 merged by jenkins-bot:

[integration/config@master] helm-linter: Remove duplicate update-alternatives for helm3

https://gerrit.wikimedia.org/r/1140168

Change #1140641 had a related patch set uploaded (by Hashar; author: Hashar):

[integration/config@master] jjb: update helm-lint job to latest helm image

https://gerrit.wikimedia.org/r/1140641

Change #1140641 merged by jenkins-bot:

[integration/config@master] jjb: update helm-lint job to latest helm image

https://gerrit.wikimedia.org/r/1140641

Mentioned in SAL (#wikimedia-operations) [2025-05-12T14:44:23Z] <jelto> update helm311 and helm317 on deploy2002 - T387548

Change #1140164 merged by Jelto:

[operations/puppet@production] helm: remove duplicate alternatives::select entry

https://gerrit.wikimedia.org/r/1140164

Mentioned in SAL (#wikimedia-operations) [2025-05-12T16:05:07Z] <jelto> update helm311 and helm317 on deploy1003 - T387548

Mentioned in SAL (#wikimedia-operations) [2025-05-12T16:17:45Z] <jelto> update helm311 and helm317 on contint1002 contint2002 - T387548

This is fixed for helm. kubernetes-client needs an updated alternative entry (for the binary and bash-completion).

Change #1161513 had a related patch set uploaded (by Jelto; author: Jelto):

[operations/debs/kubernetes@v1.23] make kubectl-completion alternative entry dependent on kubectl

https://gerrit.wikimedia.org/r/1161513

Change #1161526 had a related patch set uploaded (by Jelto; author: Jelto):

[operations/debs/kubernetes@v1.31] make kubectl-completion alternative entry dependent on kubectl (v1.31)

https://gerrit.wikimedia.org/r/1161526

Change #1161526 merged by Jelto:

[operations/debs/kubernetes@v1.31] make kubectl-completion alternative entry dependent on kubectl (v1.31)

https://gerrit.wikimedia.org/r/1161526

Change #1161884 had a related patch set uploaded (by Jelto; author: Jelto):

[operations/debs/kubernetes@v1.31] fix newline in postinst script

https://gerrit.wikimedia.org/r/1161884

Change #1161884 merged by Jelto:

[operations/debs/kubernetes@v1.31] fix newline in postinst script

https://gerrit.wikimedia.org/r/1161884

Change #1161888 had a related patch set uploaded (by Jelto; author: Jelto):

[operations/debs/kubernetes@v1.31] remove priority from slave

https://gerrit.wikimedia.org/r/1161888

Change #1161888 merged by Jelto:

[operations/debs/kubernetes@v1.31] remove priority from slave

https://gerrit.wikimedia.org/r/1161888

Change #1161894 had a related patch set uploaded (by Jelto; author: Jelto):

[operations/debs/kubernetes@v1.31] remove priority from update-alternatives --remove

https://gerrit.wikimedia.org/r/1161894

Change #1161894 merged by Jelto:

[operations/debs/kubernetes@v1.31] remove priority from update-alternatives --remove

https://gerrit.wikimedia.org/r/1161894

Change #1161513 merged by Jelto:

[operations/debs/kubernetes@v1.23] make kubectl-completion alternative entry dependent on kubectl (v1.23)

https://gerrit.wikimedia.org/r/1161513

Mentioned in SAL (#wikimedia-operations) [2025-06-20T11:59:04Z] <jelto> import kubernetes 1.23.14-6 and 1.31.4-5 to apt host - T387548

I updated staging-codfw master nodes to the new kubernetes-client131 version. kubectl-completion is a slave of kubectl now:

jelto@kubestagemaster2003:~$ sudo update-alternatives --display kubectl
kubectl - manual mode
  link best version is /usr/bin/kubectl1.31
  link currently points to /usr/bin/kubectl1.31
  link kubectl is /usr/bin/kubectl
  slave kubectl-completion is /usr/share/bash-completion/completions/kubectl
/usr/bin/kubectl1.31 - priority 70
  slave kubectl-completion: /usr/share/bash-completion/completions/kubectl-1.31

The upgrade required a manual removal of the kubectl-completion alternatives group, because kubernetes-client123 also used this group.

Kubectl looks good on the master nodes.

JMeybohm raised the priority of this task from Low to High.Jun 23 2025, 2:53 PM

The refactoring of bash-completion as a slave alternative caused issues during the Kubernetes upgrade in T397148.

The wikikubectl nodes had a version of kubernetes-client123 (1.23.14-5) installed which still defines kubectl-completion as a master alternative group.

Then the puppet run trigger the installation of kubernetes-client131 (1.31.4-5) which tried to define kubectl-completion as a slave group of kubectl. This fails with:

update-alternatives: error: alternative kubectl-completion can't be slave of kubectl: it is a master alternative

I manually deleted the alternatives group using sudo update-alternatives --remove-all kubectl-completion to solve this. After another puppet run the new version of kubernetes-client was installed properly.

I'll upload a patch to prevent this form happening when upgrading the package on the other hosts.

Change #1163287 had a related patch set uploaded (by Jelto; author: Jelto):

[operations/debs/kubernetes@v1.31] remove kubectl-completion master group before adding alternatives

https://gerrit.wikimedia.org/r/1163287

Mentioned in SAL (#wikimedia-operations) [2025-06-24T09:56:57Z] <jelto> remove kubernetes-client123 (1.23.14-5) form kubestargemaster100[3-5] - T387548

Change #1163287 abandoned by Jelto:

[operations/debs/kubernetes@v1.31] remove kubectl-completion master group before adding alternatives

Reason:

a different approach is used

https://gerrit.wikimedia.org/r/1163287

Mentioned in SAL (#wikimedia-operations) [2025-06-24T11:28:48Z] <jelto> bump kubernetes-client to newest version on kubestagemaster100[3-5] - T387548

Mentioned in SAL (#wikimedia-operations) [2025-06-24T11:44:25Z] <jelto> bump kubernetes-client to newest version on deploy1003 and deploy2002 - T387548

Mentioned in SAL (#wikimedia-operations) [2025-06-24T12:13:18Z] <jelto> bump kubernetes-client to newest version on ml-staging-ctrl200[12] - T387548

Mentioned in SAL (#wikimedia-operations) [2025-06-24T12:43:39Z] <jelto> bump kubernetes-client to newest version on dse-k8s-ctrl100[12] - T387548

Mentioned in SAL (#wikimedia-operations) [2025-06-24T12:50:55Z] <jelto> bump kubernetes-client to newest version on aux-k8s-ctrl* - T387548

Mentioned in SAL (#wikimedia-operations) [2025-06-24T13:55:21Z] <jelto> bump kubernetes-client to newest version on ml_serve-ctrl* - T387548

Mentioned in SAL (#wikimedia-operations) [2025-06-24T14:04:17Z] <jelto> bump kubernetes-client to newest version on wikikube-ctrl100[1-4] - T387548

..-
I'll upload a patch to prevent this form happening when upgrading the package on the other hosts.

This idea was discarded. I just updated all kubernetes-client packages (1.23 and 1.31) on all hosts to make sure the are all using slave alternatives instead of dedicated master alternatives for kubectl-completion.

I double-checked with cumin, there is no kubectl-completion alternatives group anymore for hosts which use the kubernetes-client package:

jelto@cumin1002:~$ sudo cumin 'R:k8s::package%package=client' 'update-alternatives --display kubectl-completion'
27 hosts will be targeted:
...
OK to proceed on 27 hosts? Enter the number of affected hosts to confirm or "q" to quit: 27
===== NODE GROUP =====                                                                                                                                                                 
(27) ...
----- OUTPUT of 'update-alternati...bectl-completion' -----                                                                                                                            
update-alternatives: error: no alternatives for kubectl-completion

The output of sudo cumin 'R:k8s::package%package=client' 'update-alternatives --display kubectl' also looks reasonable, although its different for some nodes because of different setups (1.31 installed, 1.31 and 1.23 installed and 1.23 installed). But all are defining kubectl-completion as a slave. If we remove 1.23 from the nodes and move all to 1.31 the output should be cleaner.

The last missing part is removing the extra code from operations/debs/kubernetes again and just manage the one single alternative.

Change #1163695 had a related patch set uploaded (by Jelto; author: Jelto):

[operations/debs/kubernetes@v1.31] cleanup prerm script update-alternatives command

https://gerrit.wikimedia.org/r/1163695

Change #1163695 merged by Jelto:

[operations/debs/kubernetes@v1.31] cleanup prerm script update-alternatives command

https://gerrit.wikimedia.org/r/1163695

Mentioned in SAL (#wikimedia-operations) [2025-06-25T10:47:25Z] <jelto> import kubernetes 1.31.4-6 to apt host - T387548

All hosts have been updated to use the kubernetes-client and helm3 package which uses slave alternative groups. I also updated or removed the old packages from the host where possible.

So I'll resolve this task, thanks @JMeybohm for the help!.