Page MenuHomePhabricator

Modify webservice and maintain-kubeusers to allow switching to the new cluster
Open, NormalPublic

Description

In order to migrate users to the new cluster and not destroy access to the old one at the same time, both webservice and maintain-kubeusers (new and old editions) must be capable of shifting a tool to the new cluster and, ideally, back.

One way to accomplish this is to merge the kubeconfigs in both editions of maintain-kubeusers. Then webservice and kubectl can simply set the correct context, allowing smooth migration in either direction.

Details

Related Gerrit Patches:

Event Timeline

Bstorm triaged this task as Normal priority.Oct 22 2019, 7:05 PM
Bstorm created this task.
Bstorm moved this task from Inbox to Doing on the cloud-services-team (Kanban) board.
Bstorm added subscribers: aborrero, Phamhi.

Don't forget T159892 in this one. It likely is WAY out of scope, but it is worth keeping in mind in case something doesn't work.

Change 545415 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[labs/tools/maintain-kubeusers@master] merging: merge configs when one already exists

https://gerrit.wikimedia.org/r/545415

I think the big thing with the approach proposed in that patch is that the kube configs need to be writable for it all to work as a migration mechanism. However, using the kubectl config use-context command to switch back and forth would be very supportable. If both maintain-kubeusers apps are taught to be polite if they want to change a config file, it would be safe.

I'd just need to port the code to the older one as well so it waits for locks and merges things tidily.

Now, to look at webservice.

Today I learned that there actually is a currently-developed version of pykube https://github.com/hjacobs/pykube
That said, the primitives for auth and such in the official python client are a tad better, especially for recognizing service account creds.

Change 545966 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/puppet@production] maintain-kubeusers: add ability to merge and update configs

https://gerrit.wikimedia.org/r/545966

Change 545966 merged by Bstorm:
[operations/puppet@production] maintain-kubeusers: add ability to merge and update configs

https://gerrit.wikimedia.org/r/545966

Change 545415 merged by Bstorm:
[labs/tools/maintain-kubeusers@master] merging: merge configs when one already exists

https://gerrit.wikimedia.org/r/545415

Change 547676 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/software/tools-webservice@master] newk8s: adjust things to be compatible with migration to the new cluster

https://gerrit.wikimedia.org/r/547676

Change 547676 merged by Bstorm:
[operations/software/tools-webservice@master] new k8s: adjust things to be compatible with migration to the new cluster

https://gerrit.wikimedia.org/r/547676

Bstorm added a comment.EditedWed, Nov 6, 11:16 PM

The changes to maintain-kubeusers and such worked exactly as desired.
What didn't work on the first test:

Ingress

toolsbeta.test@toolsbeta-sgebastion-04:~$ webservice --backend=kubernetes python start
Traceback (most recent call last):
  File "/usr/local/bin/webservice", line 211, in <module>
    start(job, 'Starting webservice')
  File "/usr/local/bin/webservice", line 95, in start
    job.request_start()
  File "/usr/lib/python2.7/dist-packages/toollabs/webservice/backends/kubernetesbackend.py", line 555, in request_start
    pykube.Ingress(self.api, self._get_ingress()).create()
  File "/usr/lib/python2.7/dist-packages/pykube/objects.py", line 76, in create
    self.api.raise_for_status(r)
  File "/usr/lib/python2.7/dist-packages/pykube/http.py", line 104, in raise_for_status
    raise HTTPError(payload["message"])
pykube.exceptions.HTTPError: Ingress in version "v1beta1" cannot be handled as a Ingress: v1beta1.Ingress.Spec: v1beta1.IngressSpec.Rules: []v1beta1.IngressRule: v1beta1.IngressRule.IngressRuleValue: HTTP: readObjectStart: expect { or n, but found [, error found in #10 byte of ...| "http": [{"path": "|..., bigger context ...|{"rules": [{"host": "tools.wmflabs.org", "http": [{"path": "/test", "backend": {"serviceName": "test|...

That may just be a formatting error on my part, but it also might be because the format changed quite a bit between versions of the API and possibly pykube as well. Will find out.

Also, I mistook the cascading deletes as a limitation in k8s...it's a limitation of pykube. I need to ensure cascades are done by hand in pykube for the new cluster as well.

Finally: anything touching kubectl is a problem in the new cluster. /usr/bin/kubectl is 1.10.6 and works ok with our 1.15.1-5 cluster. /usr/local/bin/kubectl is 1.4.12 and doesn't understand the rbac interaction with the new cluster at all. You also cannot use the /usr/bin/kubectl binary on the old cluster. This is a bit of a complication for the otherwise simple cluster switch via the kubectl config use-context default command (which does work in either direction with either version at least).

Ok, the problem was my format for the ingress (which I am fixing). The current ingress setup won't work with the existing toolsbeta routing, but I couldn't edit the ingress object with the 1.10.6 version of kubectl. We will need a newer one installed for people to use if it throws random errors in unexpected places.

Will be fixed in a patch coming right up.

Bstorm added a comment.Thu, Nov 7, 3:59 PM

Currently, webservice has no idea what "project" you are in. @bd808 any reason not to have it grok that from /etc/wmcs_project?

Currently, webservice has no idea what "project" you are in. @bd808 any reason not to have it grok that from /etc/wmcs_project?

We would need to mount /etc/wmcs_project into the Pods for that to work I think, but other than that no reason not to.

Bstorm added a comment.Thu, Nov 7, 5:07 PM

We would need to mount /etc/wmcs_project into the Pods for that to work I think, but other than that no reason not to.

  1. We do! :)
  2. I'm actually just talking about the webservice script itself and that runs (usually) on a VM, so it should be ok.

This would allow me to have it produce a correct ingress object for toolsbeta.

Bstorm added a comment.Thu, Nov 7, 5:08 PM

Hrm...that frontend is used in the pods, too, isn't it. Well, either way. It should work.

  1. We do! :)

You are right, /etc/wmcs-project is there. We added that in T192244: Provide a consistent way to identify operation in toolforge (including k8s) and I had forgotten. :)

Hrm...that frontend is used in the pods, too, isn't it. Well, either way. It should work.

Reviewing T190893: Setup the webservice-related instances in toolsbeta I think there may already be some part of what you need built into webservice, specifically https://gerrit.wikimedia.org/r/#/c/operations/software/tools-webservice/+/430647/

Bstorm added a comment.Thu, Nov 7, 5:35 PM

Doesn't look quite like it. I need the contents inserted into the KubernetesBackend object with a self.project assignment (so I'll just do a with open). That just mounts it in the old cluster (the new cluster does it automatically with PodPreset because I wanted to make sure the resulting pods looked roughly the same). I need to commit my fix for the ingress object anyway :)

Change 549613 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/software/tools-webservice@master] new k8s: Fix ingress object and enable toolsbeta ingress creation

https://gerrit.wikimedia.org/r/549613

Change 549661 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/puppet@production] kubectl: upgrade /usr/bin/kubectl to 1.15.5

https://gerrit.wikimedia.org/r/549661

Just noticed this:

toolsbeta.test@toolsbeta-sgebastion-04:~$ kubectl version
Client Version: version.Info{Major:"1", Minor:"4", GitVersion:"v1.4.12", GitCommit:"19e81afecf5eb2b7838c35e2cbf776aff04dc34c", GitTreeState:"clean", BuildDate:"2017-04-20T21:01:06Z", GoVersion:"go1.6.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.1", GitCommit:"4485c6f18cee9a5d3c3b4e523bd27972b1b53892", GitTreeState:"clean", BuildDate:"2019-07-18T09:09:21Z", GoVersion:"go1.12.5", Compiler:"gc", Platform:"linux/amd64"}

toolsbeta.test@toolsbeta-sgebastion-04:~$ kubectl get pods
error: group map[federation:0xc820369f80 :0xc8203b6460 authorization.k8s.io:0xc8203b6690 rbac.authorization.k8s.io:0xc820368000 apps:0xc8203b64d0 autoscaling:0xc8203b6700 extensions:0xc8203b6a80 componentconfig:0xc8203b69a0 storage.k8s.io:0xc820368070 authentication.k8s.io:0xc8203b6540 batch:0xc8203b68c0 certificates.k8s.io:0xc8203b6930 policy:0xc8203b6af0 networking.k8s.io:0xc820205420] is already registered

toolsbeta.test@toolsbeta-sgebastion-04:~$ /usr/bin/kubectl version
Client Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.5", GitCommit:"20c265fef0741dd71a66480e35bd69f18351daea", GitTreeState:"clean", BuildDate:"2019-10-15T19:16:51Z", GoVersion:"go1.12.10", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.1", GitCommit:"4485c6f18cee9a5d3c3b4e523bd27972b1b53892", GitTreeState:"clean", BuildDate:"2019-07-18T09:09:21Z", GoVersion:"go1.12.5", Compiler:"gc", Platform:"linux/amd64"}

toolsbeta.test@toolsbeta-sgebastion-04:~$ /usr/bin/kubectl get pods
runtime: failed to create new OS thread (have 30 already; errno=11)
runtime: may need to increase max user processes (ulimit -u)
fatal error: newosproc

runtime stack:
[.. ugly golang stack trace ..]

i.e, the old kubectl 1.4.12 can't properly interact with the new cluster (I believe you were already aware of this), and the new one is too constrained by systemd in the bastion?

Mentioned in SAL (#wikimedia-cloud) [2019-11-08T11:58:14Z] <arturo> adding profile::toolforge::bastion::nproc: 100 to puppet prefix toolsbeta-sgebastion (T236202)

I live-hacked the hiera change into the server (to don't destroy your other live-hacks in the bastion):

aborrero@toolsbeta-sgebastion-04:~ 5s $ sudo nano /etc/security/limits.conf
aborrero@toolsbeta-sgebastion-04:~ 5s $ sudo become test
toolsbeta.test@toolsbeta-sgebastion-04:~$ /usr/bin/kubectl get pods
NAME                   READY   STATUS    RESTARTS   AGE
test-5d5f87b66-2hfvv   1/1     Running   0          13h

Change 549613 merged by Bstorm:
[operations/software/tools-webservice@master] new k8s: Fix ingress object and enable toolsbeta ingress creation

https://gerrit.wikimedia.org/r/549613

Change 549661 merged by Bstorm:
[operations/puppet@production] kubectl: upgrade /usr/bin/kubectl to 1.15.5

https://gerrit.wikimedia.org/r/549661