Page MenuHomePhabricator

Modify webservice and maintain-kubeusers to allow switching to the new cluster
Closed, ResolvedPublic

Description

In order to migrate users to the new cluster and not destroy access to the old one at the same time, both webservice and maintain-kubeusers (new and old editions) must be capable of shifting a tool to the new cluster and, ideally, back.

One way to accomplish this is to merge the kubeconfigs in both editions of maintain-kubeusers. Then webservice and kubectl can simply set the correct context, allowing smooth migration in either direction.

Related Objects

Event Timeline

Bstorm triaged this task as Medium priority.Oct 22 2019, 7:05 PM
Bstorm created this task.
Bstorm moved this task from Inbox to Doing on the cloud-services-team (Kanban) board.
Bstorm added subscribers: aborrero, Phamhi.

Don't forget T159892 in this one. It likely is WAY out of scope, but it is worth keeping in mind in case something doesn't work.

Change 545415 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[labs/tools/maintain-kubeusers@master] merging: merge configs when one already exists

https://gerrit.wikimedia.org/r/545415

I think the big thing with the approach proposed in that patch is that the kube configs need to be writable for it all to work as a migration mechanism. However, using the kubectl config use-context command to switch back and forth would be very supportable. If both maintain-kubeusers apps are taught to be polite if they want to change a config file, it would be safe.

I'd just need to port the code to the older one as well so it waits for locks and merges things tidily.

Now, to look at webservice.

Today I learned that there actually is a currently-developed version of pykube https://github.com/hjacobs/pykube
That said, the primitives for auth and such in the official python client are a tad better, especially for recognizing service account creds.

Change 545966 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/puppet@production] maintain-kubeusers: add ability to merge and update configs

https://gerrit.wikimedia.org/r/545966

Change 545966 merged by Bstorm:
[operations/puppet@production] maintain-kubeusers: add ability to merge and update configs

https://gerrit.wikimedia.org/r/545966

Change 545415 merged by Bstorm:
[labs/tools/maintain-kubeusers@master] merging: merge configs when one already exists

https://gerrit.wikimedia.org/r/545415

Change 547676 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/software/tools-webservice@master] newk8s: adjust things to be compatible with migration to the new cluster

https://gerrit.wikimedia.org/r/547676

Change 547676 merged by Bstorm:
[operations/software/tools-webservice@master] new k8s: adjust things to be compatible with migration to the new cluster

https://gerrit.wikimedia.org/r/547676

The changes to maintain-kubeusers and such worked exactly as desired.
What didn't work on the first test:

Ingress

toolsbeta.test@toolsbeta-sgebastion-04:~$ webservice --backend=kubernetes python start
Traceback (most recent call last):
  File "/usr/local/bin/webservice", line 211, in <module>
    start(job, 'Starting webservice')
  File "/usr/local/bin/webservice", line 95, in start
    job.request_start()
  File "/usr/lib/python2.7/dist-packages/toollabs/webservice/backends/kubernetesbackend.py", line 555, in request_start
    pykube.Ingress(self.api, self._get_ingress()).create()
  File "/usr/lib/python2.7/dist-packages/pykube/objects.py", line 76, in create
    self.api.raise_for_status(r)
  File "/usr/lib/python2.7/dist-packages/pykube/http.py", line 104, in raise_for_status
    raise HTTPError(payload["message"])
pykube.exceptions.HTTPError: Ingress in version "v1beta1" cannot be handled as a Ingress: v1beta1.Ingress.Spec: v1beta1.IngressSpec.Rules: []v1beta1.IngressRule: v1beta1.IngressRule.IngressRuleValue: HTTP: readObjectStart: expect { or n, but found [, error found in #10 byte of ...| "http": [{"path": "|..., bigger context ...|{"rules": [{"host": "tools.wmflabs.org", "http": [{"path": "/test", "backend": {"serviceName": "test|...

That may just be a formatting error on my part, but it also might be because the format changed quite a bit between versions of the API and possibly pykube as well. Will find out.

Also, I mistook the cascading deletes as a limitation in k8s...it's a limitation of pykube. I need to ensure cascades are done by hand in pykube for the new cluster as well.

Finally: anything touching kubectl is a problem in the new cluster. /usr/bin/kubectl is 1.10.6 and works ok with our 1.15.1-5 cluster. /usr/local/bin/kubectl is 1.4.12 and doesn't understand the rbac interaction with the new cluster at all. You also cannot use the /usr/bin/kubectl binary on the old cluster. This is a bit of a complication for the otherwise simple cluster switch via the kubectl config use-context default command (which does work in either direction with either version at least).

Ok, the problem was my format for the ingress (which I am fixing). The current ingress setup won't work with the existing toolsbeta routing, but I couldn't edit the ingress object with the 1.10.6 version of kubectl. We will need a newer one installed for people to use if it throws random errors in unexpected places.

Will be fixed in a patch coming right up.

Currently, webservice has no idea what "project" you are in. @bd808 any reason not to have it grok that from /etc/wmcs_project?

Currently, webservice has no idea what "project" you are in. @bd808 any reason not to have it grok that from /etc/wmcs_project?

We would need to mount /etc/wmcs_project into the Pods for that to work I think, but other than that no reason not to.

We would need to mount /etc/wmcs_project into the Pods for that to work I think, but other than that no reason not to.

  1. We do! :)
  2. I'm actually just talking about the webservice script itself and that runs (usually) on a VM, so it should be ok.

This would allow me to have it produce a correct ingress object for toolsbeta.

Hrm...that frontend is used in the pods, too, isn't it. Well, either way. It should work.

  1. We do! :)

You are right, /etc/wmcs-project is there. We added that in T192244: Provide a consistent way to identify operation in toolforge (including k8s) and I had forgotten. :)

Hrm...that frontend is used in the pods, too, isn't it. Well, either way. It should work.

Reviewing T190893: Setup the webservice-related instances in toolsbeta I think there may already be some part of what you need built into webservice, specifically https://gerrit.wikimedia.org/r/#/c/operations/software/tools-webservice/+/430647/

Doesn't look quite like it. I need the contents inserted into the KubernetesBackend object with a self.project assignment (so I'll just do a with open). That just mounts it in the old cluster (the new cluster does it automatically with PodPreset because I wanted to make sure the resulting pods looked roughly the same). I need to commit my fix for the ingress object anyway :)

Change 549613 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/software/tools-webservice@master] new k8s: Fix ingress object and enable toolsbeta ingress creation

https://gerrit.wikimedia.org/r/549613

Change 549661 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/puppet@production] kubectl: upgrade /usr/bin/kubectl to 1.15.5

https://gerrit.wikimedia.org/r/549661

Just noticed this:

toolsbeta.test@toolsbeta-sgebastion-04:~$ kubectl version
Client Version: version.Info{Major:"1", Minor:"4", GitVersion:"v1.4.12", GitCommit:"19e81afecf5eb2b7838c35e2cbf776aff04dc34c", GitTreeState:"clean", BuildDate:"2017-04-20T21:01:06Z", GoVersion:"go1.6.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.1", GitCommit:"4485c6f18cee9a5d3c3b4e523bd27972b1b53892", GitTreeState:"clean", BuildDate:"2019-07-18T09:09:21Z", GoVersion:"go1.12.5", Compiler:"gc", Platform:"linux/amd64"}

toolsbeta.test@toolsbeta-sgebastion-04:~$ kubectl get pods
error: group map[federation:0xc820369f80 :0xc8203b6460 authorization.k8s.io:0xc8203b6690 rbac.authorization.k8s.io:0xc820368000 apps:0xc8203b64d0 autoscaling:0xc8203b6700 extensions:0xc8203b6a80 componentconfig:0xc8203b69a0 storage.k8s.io:0xc820368070 authentication.k8s.io:0xc8203b6540 batch:0xc8203b68c0 certificates.k8s.io:0xc8203b6930 policy:0xc8203b6af0 networking.k8s.io:0xc820205420] is already registered

toolsbeta.test@toolsbeta-sgebastion-04:~$ /usr/bin/kubectl version
Client Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.5", GitCommit:"20c265fef0741dd71a66480e35bd69f18351daea", GitTreeState:"clean", BuildDate:"2019-10-15T19:16:51Z", GoVersion:"go1.12.10", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.1", GitCommit:"4485c6f18cee9a5d3c3b4e523bd27972b1b53892", GitTreeState:"clean", BuildDate:"2019-07-18T09:09:21Z", GoVersion:"go1.12.5", Compiler:"gc", Platform:"linux/amd64"}

toolsbeta.test@toolsbeta-sgebastion-04:~$ /usr/bin/kubectl get pods
runtime: failed to create new OS thread (have 30 already; errno=11)
runtime: may need to increase max user processes (ulimit -u)
fatal error: newosproc

runtime stack:
[.. ugly golang stack trace ..]

i.e, the old kubectl 1.4.12 can't properly interact with the new cluster (I believe you were already aware of this), and the new one is too constrained by systemd in the bastion?

Mentioned in SAL (#wikimedia-cloud) [2019-11-08T11:58:14Z] <arturo> adding profile::toolforge::bastion::nproc: 100 to puppet prefix toolsbeta-sgebastion (T236202)

I live-hacked the hiera change into the server (to don't destroy your other live-hacks in the bastion):

aborrero@toolsbeta-sgebastion-04:~ 5s $ sudo nano /etc/security/limits.conf
aborrero@toolsbeta-sgebastion-04:~ 5s $ sudo become test
toolsbeta.test@toolsbeta-sgebastion-04:~$ /usr/bin/kubectl get pods
NAME                   READY   STATUS    RESTARTS   AGE
test-5d5f87b66-2hfvv   1/1     Running   0          13h

Change 549613 merged by Bstorm:
[operations/software/tools-webservice@master] new k8s: Fix ingress object and enable toolsbeta ingress creation

https://gerrit.wikimedia.org/r/549613

Change 549661 merged by Bstorm:
[operations/puppet@production] kubectl: upgrade /usr/bin/kubectl to 1.15.5

https://gerrit.wikimedia.org/r/549661

Mentioned in SAL (#wikimedia-cloud) [2019-11-26T22:57:34Z] <bstorm_> push upgraded webservice 0.52 to the buster and jessie repos for container rebuilds T236202

Mentioned in SAL (#wikimedia-cloud) [2019-11-26T23:25:16Z] <bstorm_> rebuilding docker images to include the new webservice 0.52 in all versions instead of just the stretch ones T236202

Change 554609 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[labs/tools/maintain-kubeusers@master] migration: Add force-migrate option and simplify code

https://gerrit.wikimedia.org/r/554609

Change 554903 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[labs/tools/maintain-kubeusers@master] migration: add option to switch all remaining users to new cluster

https://gerrit.wikimedia.org/r/554903

Change 554912 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[labs/tools/maintain-kubeusers@master] testing: add tests for migrating users and make user tests better

https://gerrit.wikimedia.org/r/554912

Change 554609 merged by Bstorm:
[labs/tools/maintain-kubeusers@master] tidyup: Refactor and simplify code

https://gerrit.wikimedia.org/r/554609

Change 554903 merged by Bstorm:
[labs/tools/maintain-kubeusers@master] migration: add option to switch all remaining users to new cluster

https://gerrit.wikimedia.org/r/554903

Change 554912 merged by Bstorm:
[labs/tools/maintain-kubeusers@master] testing: add tests for migrating users and make user tests better

https://gerrit.wikimedia.org/r/554912

TODO: We also need to persist the new --mem and --cpu values into service.manifest so that webservice restart can pick them up and reapply them to the new Deployment.

Change 562996 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/software/tools-webservice@master] Apply black formatting and make the webservice frontend pass flake8

https://gerrit.wikimedia.org/r/562996

Change 563003 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/software/tools-webservice@master] kubernetes: persist the cpu and mem args in service manifests

https://gerrit.wikimedia.org/r/563003

Change 562996 merged by jenkins-bot:
[operations/software/tools-webservice@master] Apply black formatting and make the webservice frontend pass flake8

https://gerrit.wikimedia.org/r/562996

Change 563003 merged by Bstorm:
[operations/software/tools-webservice@master] kubernetes: persist the cpu and mem args in service manifests

https://gerrit.wikimedia.org/r/563003

Change 563592 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/software/tools-webservice@master] k8s: Set default requests for the new cluster

https://gerrit.wikimedia.org/r/563592

Change 563592 merged by Bstorm:
[operations/software/tools-webservice@master] k8s: Set default requests for the new cluster

https://gerrit.wikimedia.org/r/563592

Change 563624 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/software/tools-webservice@master] k8s: Don't restart all k8s machinery to reboot a basic webservice

https://gerrit.wikimedia.org/r/563624

Change 563624 merged by Bstorm:
[operations/software/tools-webservice@master] k8s: Don't restart all k8s machinery to reboot a basic webservice

https://gerrit.wikimedia.org/r/563624

This piece is really done