Page MenuHomePhabricator

Replace each of the custom controllers with something in a new Toolforge Kubernetes setup
Closed, ResolvedPublic

Description

There are four custom controller items in the Toolforge Kubernetes cluster. These keep us from doing quick upgrades right now, and a lot of that is more easily available in modern code. We need to actually ensure they are all effectively replaced, document and deploy those replacements.

For each one of these, ensure we have comparable functionality enabled or deployable in some way except where we want to retire a feature altogether.

Details

Related Gerrit Patches:
labs/tools/maintain-kubeusers : masterpodpresets: Create a pod preset to automount hostpaths and set HOME for tools
operations/puppet : productiontoolforge-k8s: enable the settings API and PodPreset
labs/tools/registry-admission-webhook : masterwebhook: Add first run of code

Related Objects

Event Timeline

Bstorm triaged this task as High priority.Feb 8 2019, 11:42 PM
Bstorm created this task.

And note that this is in beta and might provide all that we need in 1.13, per prediction in the docs: https://kubernetes.io/docs/concepts/policy/pod-security-policy/

Also "in beta" means, game for us to use in k8s since so are ingresses, config files for kubeadm, etc.

UID enforcer definitely might be replaceable with pod security policies. Container registry seems like it's going to need a webhook. Automount and hostpath appear to be easily replaced with pod security policies.

GTirloni removed a subscriber: GTirloni.Mar 21 2019, 9:06 PM
Bstorm moved this task from Inbox to Doing on the cloud-services-team (Kanban) board.
Bstorm added a subscriber: GTirloni.
NOTE: Pod security policies are available in 1.12, which is used in production. We don't need a newer version to use that functionality to replace 3 of 4 controllers.

The webhook could be done in flask/wsgi for the team to find it easier to maintain, but it also might be more flexible and quick to deploy if done in Go (where the actual objects from the k8s source can be imported to interpret the API objects).

Bstorm removed a subscriber: GTirloni.May 21 2019, 3:06 PM

Hey, I'm just catching up. I wasn't aware of the existence of this phab task, and followed all your same steps until I got here with your very same conclusions :-)

Bstorm added a comment.Jun 5 2019, 9:20 PM

Apparently also the team is cool with the webhook being Go, which is easier in some ways, so I'll try to refocus at some point soon and add that.

Requested a gerrit repo for the webhook code.

Finished a testable (has three unit tests), working (on my local copy of k8s) webhook. As soon as the repo is created, I'll push it up there. Deployment is kinda weird right now (you use a couple scripts to have the k8s master generate certs for you, then kubectl create it), but I prefer that to deploying with webservice. Going to see if I can make it have more than one replica or something to keep it more HA.

Might dress up the code and autogen docs a bit more as well. Hrm. It needs a readme. This should be far more maintainable than custom compiled stuff in kubernetes that won't work between major versions and fail tests that take longer than 20 min. to run.

Right now, it runs a multi-stage docker image build on a base of scratch rather than Debian. There is exactly zero reason to use anything other than scratch for a golang or similar statically-linked binary container. So I also have to work out how to make that fit into our usual image build process.

Just FYI, when you try to violate the policy with a pod that isn't in the kube-system namespace (the policy was blocking system activity without that), it gives you Error from server: error when creating "busybox.yaml": admission webhook "registry-admission.tools.wmflabs.org" denied the request: Only WMCS-approved docker registry allowed

Change 517471 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[labs/tools/registry-admission-webhook@master] webhook: Add first run of code

https://gerrit.wikimedia.org/r/517471

Change 517471 merged by Bstorm:
[labs/tools/registry-admission-webhook@master] webhook: Add first run of code

https://gerrit.wikimedia.org/r/517471

Bstorm updated the task description. (Show Details)Jun 20 2019, 4:08 PM

Tentatively crossing off the registry validation bit because the webhook is deployable. I fully expect to find ways it isn't finished when we are rolling things out in toolsbeta.

The uidenforcer admission controller appears to be a combination of "don't run as root" and RBAC when RBAC doesn't exist.

I think we may need to add a PodPreset injection for the automounter. However, I'm more concerned about restricting mounts than forcing them. I'll test that.

Bstorm updated the task description. (Show Details)Oct 3 2019, 8:59 PM

the maintain-kubeusers service now takes care of 2 more of the controllers using PSP and RBAC controls

Bstorm added a comment.Oct 3 2019, 9:09 PM

Currently, the host automounter controller mounts the following as read-only mounts:

--host-automounts=/etc/ldap.conf,/etc/ldap.yaml,/etc/novaobserver.yaml,/var/run/nslcd/socket

The new PSP prevents any of those from being mounted in any way other than read-only, except that /var/run/nslcd/socket is intentionally disallowed because of moving to sssd. While the others could have value inside pods, force-mounting them does not necessarily seem useful except as a default since they cannot be mounted read-write. Also a novaobserver replacement really should use pod-mounted credentials like a service account or similar (needs some definition here T233372). It might be worth it to include a mutating webhook that mounts /var/lib/sss/pipes at least to tool namespaces.

Mentioned in SAL (#wikimedia-cloud) [2019-10-25T23:41:32Z] <bstorm_> Deployed custom webhook controllers for registry and ingress checking to toolsbeta-test kubernetes cluster T215531 T215678 T234231

Change 546764 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/puppet@production] toolforge-k8s: enable the settings API and PodPreset

https://gerrit.wikimedia.org/r/546764

I've done a POC of PodPreset locally and observed how it works. I think it might be worth it.

Unless a pod has a custom annotation, if it matches certain labels defined on a namespace, the pod will be altered by the PodPreset. The obvious use-case is the label tools.wmflabs.org/webservice: "true" which is applied by webservice now and perhaps an additional one to make it easy for non-webservice pods to use a preset to mount all the "standard" items as well.

Currently 26 pods in Toolforge are not tools.wmflabs.org/webservice: "true"

This is a working PodPreset I used locally that functions well alongside the PodSecurityPolicies we are using:

apiVersion: settings.k8s.io/v1alpha1
kind: PodPreset
metadata:
  namespace: tool-example
  name: mount-toolforge-vols
spec:
  selector:
    matchLabels:
      tools.wmflabs.org/webservice: "true"
  volumeMounts:
    - mountPath: /public/dumps/
      name: dumps
      readOnly: true
    - mountPath: /data/project/
      name: home
    - mountPath: /etc/wmcs-project
      name: wmcs-project
      readOnly: true
    - mountPath: /data/scratch/
      name: scratch
    - mountPath: /etc/ldap.conf
      readOnly: true
      name: etcldap-conf
    - mountPath: /etc/ldap.yaml
      name: etcldap-yaml
      readOnly: true
    - mountPath: /etc/novaobserver.yaml
      name: etcnovaobserver-yaml
      readOnly: true
    - mountPath: /var/lib/sss/pipes
      name: sssd-socket
  volumes:
    - hostPath:
        path: /public/dumps
        type: Directory
      name: dumps
    - hostPath:
        path: /data/project
        type: Directory
      name: home
    - hostPath:
        path: /etc/wmcs-project
        type: File
      name: wmcs-project
    - hostPath:
        path: /data/scratch
        type: Directory
      name: scratch
    - hostPath:
        path: /etc/ldap.conf
        type: File
      name: etcldap-conf
    - hostPath:
        path: /etc/ldap.yaml
        type: File
      name: etcldap-yaml
    - hostPath:
        path: /etc/novaobserver.yaml
        type: File
      name: etcnovaobserver-yaml
    - hostPath:
        path: /var/lib/sss/pipes
        type: Directory
      name: sssd-socket

Change 546764 merged by Bstorm:
[operations/puppet@production] toolforge-k8s: enable the settings API and PodPreset

https://gerrit.wikimedia.org/r/546764

Change 547353 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[labs/tools/maintain-kubeusers@master] podpresets: Create 2 pod presets to automount hostpaths for tools

https://gerrit.wikimedia.org/r/547353

Change 547353 merged by Bstorm:
[labs/tools/maintain-kubeusers@master] podpresets: Create a pod preset to automount hostpaths and set HOME for tools

https://gerrit.wikimedia.org/r/547353

Bstorm closed this task as Resolved.Oct 31 2019, 9:28 PM
Bstorm updated the task description. (Show Details)

And that completes this task. With two webhooks, PodSecurityPolicy, PodPreset, RBAC and the new maintain-kubeusers, we do not need the compiled-in custom controllers to make Kubernetes what we want it to be, and it will now know several new tricks.