Page MenuHomePhabricator

Toolforge: changes to maintain-kubeusers
Open, NormalPublic


The maintain-kubeusers script is the mechanism we currently use to create, populate and maintain the Toolforge tools information into Kubernetes.

Some stuff that the script currently does, when dealing with new tools:

  • fetch tool information from LDAP
  • create the homedir in NFS
  • generates a token-based auth
  • generates the ABAC config
  • create a k8s namespace per tool
  • writes a kubeconfig file per tool with the corresponding token and place it in the homedir
  • restart the kube-apiserver service to read new changes

In the new k8s we are trying to move away from ABAC/token and use RBAC/x509 instead, so this script will require some updates to handle this new situation.
Also, the script currently runs in the master nodes. We could probably try something nicer like running this inside k8s itself (define a static pod by means of puppet?)

Event Timeline

aborrero created this task.Jul 19 2019, 9:25 AM
aborrero updated the task description. (Show Details)Jul 19 2019, 9:28 AM

T227290 and the RBAC structure will be changing some things. I also realized that the mechanism is vastly more flexible than the current setup (and doesn't involve restarting the api-server).

The process that automates things (maintain-kubeusers 2.0?) can run inside kubernetes because of how it will work (and benefit from the redundancy there) because of that, and it is likely to be quite a different sort of script in the end.

Just cross-referencing notes.

Bstorm triaged this task as Normal priority.Jul 29 2019, 4:24 PM

Ok, so it seems like the way to do this is:

  • Get a list of all the users from LDAP
  • Get a list of namespaces/configmaps in k8s for each toolforge user
  • Do a diff, find new users and users with deleted configmaps
  • For each new user or removed configmap:
    • Create new namespace (only for a new user)
    • generate a CSR (including the right group for RBAC/PSP)
    • Validate and approve the CSR
    • Drop the .kube/config file in the tool directory
    • Annotate the namespace with configmap

Using the configmap with basically just "kubeconfigured: true" in it, will provide a workaround for not having a user database to compare to in k8s. This will also restrict k8s usage to tools, not individual users, which is how the whole system is designed already. Since groups are permitted on certs, the RBAC can reference a group, thus the script won't have to interact with that. The RBAC should be defined simply at the cluster level with a ClusterRoleBinding.

That will simplify tools to sharing one role that can have a defined set of policies and capabilities here: T227290

Making great progress using python to talk to the certs API. Once things work a bit better, I'll commit some code.

Found breaking changes between v7.0.0 (buster) and v10.0.0 (current) of the python-kubernetes library. Writing the code that works with both of these only.

Bstorm added a comment.EditedJul 30 2019, 3:31 PM

So it turns out that in the certificates API of Kubernetes, CSRs, which include the certificate, can be kept around largely indefinitely. This feels kind of like keeping everyone's password in a secret file, so I'm going to proceed with deleting them after use and using config maps to note who has a cert.

That said, the cert cannot actually be used without the PK distributed to the user. So we *can* just use that as the "user database". I prefer the config maps, though.

Since I am currently hoping to deploy this into kubernetes, requesting a gerrit repo at labs/tools/maintain-kubeusers. If we end up deploying on a server instead (such as an NFS host or something like that), we should be able to just as easily deploy via puppet instead. I am ensuring the code tests out with the python packages available in buster just in case as well as what can be had via pip.

Bstorm added a comment.Aug 1 2019, 3:49 PM

So all the machinery in this is currently working (except haven't yet tested/written the part authenticating as a service account -- the functions all work). This brings up an interesting problem: migration. If this and the old maintain-kubeusers just kind of fight back and forth, overwriting the .kube/config file, it won't do. There are some options here:

  1. Use contexts and dynamically switch them with an argument in webservice.
  2. Use a mechanism of environment variables. Scripts can choose the location of KUBECONFIG for each command with some wrappers.
  3. Migrate everything simultaneously in one glorious downtime by script, changing everyone's kubeconfig at that moment, requiring us to have two full-size clusters running at the same time and to make no mistakes at all after a lot of testing.

Contexts is the supported way of merging kubeconfigs (I do this locally). However, this implies users are using (directly or through a wrapper) kubectl. Also, the mechanism implies doing a context switch to change current-context in their .kube/config. This may not be so bad if we can keep the old maintain-kubeusers and this one only managing their context in the file and webservice and friends can provide some kind of interface to help people migrate services by changing context through the wrappers. It seems like a potentially confusing support issue, but it could work.

With env vars, we know that env vars are often changed by our users (and some users are doing things like using ansible to deploy tools) which is the support issue here. On the other hand, this definitely simplifies the issue of the 2-script-slap-fight problem. Each script would simply manage their own config file. The new one would eventually become .kube/config when the old cluster is removed.

Since I'm currently still convinced we need a bigger k8s cluster even now (at 39 workers), the 3rd option is pretty ambitious--though only briefly. It requires a whole lot of testing (including building test-cases for webservice with feature flags of some sort). I'm picturing the thing we forgot requiring a rollback script as well. It is possible this could be live for some time (instead of briefly) with some mechanism to allow early adopters to help us test while accepting a greater level of complexity and likely problems during the beta phase. I have no idea if there's enough hardware to support such a rollout...and for a beta phase, we'd need options 1 or 2 above on top of this.

I have other things to work on for this (auth, nfs writing behavior, etc.), but I'm going to move this for discussion.

Bstorm claimed this task.Aug 1 2019, 3:52 PM
Bstorm added a comment.Aug 2 2019, 4:13 PM

New repo
Uploading my scratch code there so people can see what I'm doing.

Change 527598 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[labs/tools/maintain-kubeusers@master] maintain_kubeusers: Start the script up

Bstorm added a comment.Aug 2 2019, 4:33 PM

The discussion item is T228499#5384661, since I've now messed up the order.

Change 527598 merged by Bstorm:
[labs/tools/maintain-kubeusers@master] maintain_kubeusers: Start the script up

Change 529457 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[labs/tools/maintain-kubeusers@master] maintain_kubeusers: organize into a decent project

Change 529986 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[labs/tools/maintain-kubeusers@master] maintain_kubeusers: Add MIT license

Change 529988 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[labs/tools/maintain-kubeusers@master] maintain_kubeusers: update the code to a working and more consistent state

Change 529989 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[labs/tools/maintain-kubeusers@master] maintain_kubeusers: introduce standard python project structure

Change 529990 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[labs/tools/maintain-kubeusers@master] maintain_kubeusers: add test suite

Change 529457 abandoned by Bstorm:
maintain_kubeusers: organize into a decent project

Splitting into 4 patches

Change 529986 merged by Bstorm:
[labs/tools/maintain-kubeusers@master] maintain_kubeusers: Add MIT license

Change 529988 merged by Bstorm:
[labs/tools/maintain-kubeusers@master] maintain_kubeusers: update the code to a working and more consistent state

Change 529989 merged by Bstorm:
[labs/tools/maintain-kubeusers@master] maintain_kubeusers: introduce standard python project structure

Change 529990 merged by Bstorm:
[labs/tools/maintain-kubeusers@master] maintain_kubeusers: add test suite

Currently this is using the "edit" default clusterrole for new toolforge users. That is absolutely not what I'd like it to use for now. So I've dumped out the permissions that grants and have commented out the pieces I'd like to further restrict for toolforge. I'll include this in the PSP/RBAC ticket as well with a bit more explaination (T227290: Design and document how to integrate the new Toolforge k8s cluster with PodSecurityPolicy)

1# RBAC minimum perms for toolforge users, based on the "edit" system clusterrole with some permissions commented out:
2# verbs for R/O
3# ["get", "list", "watch"]
5# verbs for R/W
6# ["get", "list", "watch", "create", "update", "patch", "delete"]
8# resources needed by Toolforge users for sure:
9# pods, deployments,
12- apiGroups:
13 - ""
14 resources:
15 - pods/attach
16 - pods/exec
17 - pods/portforward
18 - pods/proxy
19 - secrets
20 - services/proxy
21 verbs:
22 - get
23 - list
24 - watch
25# - apiGroups:
26# - ""
27# resources:
28# - serviceaccounts
29# verbs:
30# - impersonate
31- apiGroups:
32 - ""
33 resources:
34 - pods
35 - pods/attach
36 - pods/exec
37 - pods/portforward
38 - pods/proxy
39 verbs:
40 - create
41 - delete
42 - deletecollection
43 - patch
44 - update
45- apiGroups:
46 - ""
47 resources:
48 - configmaps
49 - endpoints
50# - persistentvolumeclaims
51 - replicationcontrollers
52 - replicationcontrollers/scale
53 - secrets
54# - serviceaccounts
55 - services
56 - services/proxy
57 verbs:
58 - create
59 - delete
60 - deletecollection
61 - patch
62 - update
63- apiGroups:
64 - apps
65 resources:
66# - daemonsets
67 - deployments
68 - deployments/rollback
69 - deployments/scale
70 - replicasets
71 - replicasets/scale
72 - statefulsets
73 - statefulsets/scale
74 verbs:
75 - create
76 - delete
77 - deletecollection
78 - patch
79 - update
80# - apiGroups:
81# - autoscaling
82# resources:
83# - horizontalpodautoscalers
84# verbs:
85# - create
86# - delete
87# - deletecollection
88# - patch
89# - update
90- apiGroups:
91 - batch
92 resources:
93 - cronjobs
94 - jobs
95 verbs:
96 - create
97 - delete
98 - deletecollection
99 - patch
100 - update
101- apiGroups:
102 - extensions
103 resources:
104# - daemonsets
105 - deployments
106 - deployments/rollback
107 - deployments/scale
108 - ingresses
109 - networkpolicies
110 - replicasets
111 - replicasets/scale
112 - replicationcontrollers/scale
113 verbs:
114 - create
115 - delete
116 - deletecollection
117 - patch
118 - update
119# - apiGroups:
120# - policy
121# resources:
122# - poddisruptionbudgets
123# verbs:
124# - create
125# - delete
126# - deletecollection
127# - patch
128# - update
129- apiGroups:
130 -
131 resources:
132 - ingresses
133 - networkpolicies
134 verbs:
135 - create
136 - delete
137 - deletecollection
138 - patch
139 - update
140- apiGroups:
141 - ""
142 resources:
143 - configmaps
144 - endpoints
145 - persistentvolumeclaims
146 - pods
147 - replicationcontrollers
148 - replicationcontrollers/scale
149# - serviceaccounts
150 - services
151 verbs:
152 - get
153 - list
154 - watch
155- apiGroups:
156 - ""
157 resources:
158 - bindings
159 - events
160 - limitranges
161 - namespaces/status
162 - pods/log
163 - pods/status
164 - replicationcontrollers/status
165 - resourcequotas
166 - resourcequotas/status
167 verbs:
168 - get
169 - list
170 - watch
171- apiGroups:
172 - ""
173 resources:
174 - namespaces
175 verbs:
176 - get
177 - list
178 - watch
179- apiGroups:
180 - apps
181 resources:
182 - controllerrevisions
183 - daemonsets
184 - deployments
185 - deployments/scale
186 - replicasets
187 - replicasets/scale
188 - statefulsets
189 - statefulsets/scale
190 verbs:
191 - get
192 - list
193 - watch
194- apiGroups:
195 - autoscaling
196 resources:
197 - horizontalpodautoscalers
198 verbs:
199 - get
200 - list
201 - watch
202- apiGroups:
203 - batch
204 resources:
205 - cronjobs
206 - jobs
207 verbs:
208 - get
209 - list
210 - watch
211- apiGroups:
212 - extensions
213 resources:
214 - daemonsets
215 - deployments
216 - deployments/scale
217 - ingresses
218 - networkpolicies
219 - replicasets
220 - replicasets/scale
221 - replicationcontrollers/scale
222 verbs:
223 - get
224 - list
225 - watch
226- apiGroups:
227 - policy
228 resources:
229 - poddisruptionbudgets
230 verbs:
231 - get
232 - list
233 - watch
234- apiGroups:
235 -
236 resources:
237 - ingresses
238 - networkpolicies
239 verbs:
240 - get
241 - list
242 - watch

I was just thinking about the fact that we are applying RBAC to the user and not to the group (which seems more efficient). At this time, here is why (and this needs to be documented in the script): I do not support using the group annotation for overall cluster access when using x.509 certificates because this issue is not resolved yet. An issued cert exists until it expires, so RBAC is our primary means of immediately shutting down user access. This means we can tie some things to the group, but other things (write access to resources, at least) must be tied to the user until there is a reasonable mechanism for invalidating client certs.

This also points out the absolute need for this script to be able to renew certs that are still listed in the configmap as valid and the file still exists on the NFS.

One way we could invalidate certs is by ensuring that the configmap cannot be changed by users other than admins and having this script deliberately issue an expired cert to that user. Obviously, this is not even close to foolproof, but it would be an additional layer on top of removing the rolebinding.

Removing toolforge access is an effective removal as well if the API server is not publicly accessible of course. However, ensuring that the RBAC can be removed as well helps. Rebuilding the CA sounds like a poor option.

Change 530458 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[labs/tools/maintain-kubeusers@master] maintain_kubeusers: add basic expiration detection

Change 530606 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[integration/config@master] integration: add tests for the labs/tools/maintain-kubeusers repo

Change 530647 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[labs/tools/maintain-kubeusers@master] maintain_kubeusers: authenticate in k8s and out -- fix tests and docker

Change 530458 merged by Bstorm:
[labs/tools/maintain-kubeusers@master] maintain_kubeusers: add basic expiration detection

Change 530647 merged by Bstorm:
[labs/tools/maintain-kubeusers@master] maintain_kubeusers: authenticate in k8s and out -- fix tests and docker

Change 530606 merged by jenkins-bot:
[integration/config@master] integration: add tests for the labs/tools/maintain-kubeusers repo

Mentioned in SAL (#wikimedia-releng) [2019-08-28T23:00:38Z] <James_F> Zuul: add tests for the labs/tools/maintain-kubeusers repo T228499

I have managed a live test of this in a Kubernetes cluster with LDAP. New permissions were needed for the clusterrole. Additionally, the serviceaccount running this needs all permissions that it grants to other users because that's the rules in Kubernetes. Since it currently grants the clusterrole "edit" in a namespace, I had to give the sa that permission as a clusterrolebinding (because it must be able to do all those things in the target namespace). I kind of hate that, but it is necessary.

I have one last thing to ensure is working just to be sure: NFS. My test today used the pod's local volume. I need to change my pod definition to mount the persistentvolumebinding and test again.

Change 533364 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[labs/tools/maintain-kubeusers@master] live-testing: make this work inside a k8s cluster

Bstorm added a comment.Thu, Sep 5, 6:39 PM

So one thing learned from testing this with NFS: volume claims are namespaced. The only way to share an NFS mount across namespaces at this time is as a hostPath.

We should not attempt to use volume claims in our environment until we abandon "large NFS share for all" strategy as well.

To correctly mimic the behavior of the UID enforcer controller, T215678: Replace each of the custom controllers with something in a new Toolforge Kubernetes setup, maintain-kubeusers must apply a UID restriction to each user and namespaced default service account. Going to test that notion.