Page MenuHomePhabricator

[infra] Fix the mis-named k8s service in tools and toolsbeta projects
Open, MediumPublic

Description

We created the new Toolforge Kubernetes with a DNS name that doesn't follow convention back in T236826: Toolforge: new k8s: initial build of the new kubernetes cluster.

The existing name is k8s.tools.eqiad1.wikimedia.cloud and it should be k8s.svc.tools.eqiad1.wikimedia.cloud. We did the same thing with toolsbeta.

Unfortunately everyone's config has the existing name, and I'm sure maintain-kubeusers doesn't know how to switch it without some development. The cluster has a name, but I *think* that name is entirely different anyway. We may be fine there, but we'd want to be careful about puppet, user auth and similar.

Event Timeline

Bstorm triaged this task as Medium priority.Sep 10 2020, 4:05 PM
Bstorm created this task.
Bstorm renamed this task from Fix the mis-named k8s service in tools project to Fix the mis-named k8s service in tools and toolsbeta projects.Sep 10 2020, 5:01 PM
Bstorm updated the task description. (Show Details)

The init file info should be manually edited into the configmap so that when we upgrade, the change takes effect. (just saying it here so it's known to all, including those who didn't read the patch chatter).

Mentioned in SAL (#wikimedia-cloud) [2021-05-15T07:52:07Z] <Majavah> set profile::wmcs::kubeadm::control::apiserver_cert_alternative_names hiera key and adjust config map T262562

Updating the config map seems to have done nothing even after two Kubernetes upgrades. The documentation says that renewing certificates will use the previous certs as source for the data. So I wonder if we need to replace the control nodes to get completely new certificates?

We can manually replace them. It's not a lot of fun, but I've done it before. That said, replacing the control plane is effectively replacing the cluster since the entire cluster's PKI is based on it. OK, maybe not entirely because you can preserve the CA...

I mean, I did it with openssl. Other apps are a lot easier, and you could double check your products before moving the new certs in place (comparing the -text output). If the cluster keys off the old certs for source data, that should work.

The only thing that might be nice is testing a kubeadm certs refresh in toolsbeta after the manually created certs are in place. That plays the phase that kubeadm uses for upgrades, so we'll know for sure if we have succeeded :)

So yeah, I propose we try just making new certs and keys based on the cluster CA and try inserting them then renewing them with kubeadm to see how it acts.

taavi removed taavi as the assignee of this task.Sep 18 2023, 5:35 PM
taavi claimed this task.

I'm still planning to do this when we next refresh the toolforge control plane nodes.

dcaro renamed this task from Fix the mis-named k8s service in tools and toolsbeta projects to [toolforge,k8s] Fix the mis-named k8s service in tools and toolsbeta projects.Mar 5 2024, 4:11 PM
dcaro renamed this task from [toolforge,k8s] Fix the mis-named k8s service in tools and toolsbeta projects to [infra] Fix the mis-named k8s service in tools and toolsbeta projects.Mar 5 2024, 5:15 PM
taavi removed taavi as the assignee of this task.Tue, Jun 25, 3:35 PM