Page MenuHomePhabricator

Setup NSS inside containers used in Tool Labs
Closed, ResolvedPublic

Description

Containers need a NSS config that contacts the labs LDAP for user / group information. This is required because:

  1. Our cluster enforces that tools must run as a specific UID that's associated with their LDAP account. This is both to protect against issues when tools run as root inside containers, and to make NFS permissions work ok.
  2. There is no user entry for this UID / GID inside the container (/etc/passwd, /etc/group, etc). This will cause programs that attempt to get the name of current user (Which is a lot of them) to crash

Figure out what is the appropriate NSS configuration to use inside containers, as well as how to best refresh and redeploy them.

Options include:

  1. Bake them into the container. This is simplest, but then rebuild and redeploy can take a while when needed
  2. Write the config out with puppet in the k8s worker nodes, mount it readonly by default with an admission controller
  3. Something else.

(1) might be the simplest / right thing to do, but it'll make our containers useless outside of labs environment. (2) is a bit ugly but very effective, and decouples container building from our environment specific stuff. (3) could be ConfigMap or similar alternative, but I am not too sure those will work in a reasonably foolproof manner.

Event Timeline

This should also run without nscd / nslcd caching daemons.

I realized after reading through more documentaiton that I don't actually want PAM but I want NSS.

yuvipanda renamed this task from Setup PAM inside containers used in Tool Labs to Setup NSS inside containers used in Tool Labs.May 10 2016, 7:56 AM
yuvipanda updated the task description. (Show Details)

It's actually NSS, not PAM.

After some experimentation, libnss-ldapd which is the recommended setup, works *almost flawlessly* out of the box, except for the fact that it requires nslcd be running :( This is unideal, since we'd like one container to have only logical process. We'd have to have some form of init process in each container to manage nslcd, and resource accounting becomes problematic as well. I'd very much prefer to not do this.

The other option is libnss-ldap, which is older, buggier, doesn't seem to support all the features of libnss-ldapd, but doesn't require a daemon. @MoritzMuehlenhoff explicitly recommended against it, so I'd prefer to not use it, but it does work without requiring a daemon for most cases (doesn't support multiple groups per user it looks like. Maybe that's just a configuration issue?). we also use libnss-ldapd in labs already, so this would be a confusing change-up.

Our options realistically are:

  1. Figure out a way to run libnss-ldapd in containers without requiring the nslcd daemon
  2. Run nslcd one per worker node via a kubernetes daemonset (which were designed for use cases like this), and mount the socket on all containers with an admission controller. This should be ok if we are comfortable with the security implications of this.
  3. Something else not listed here, hackier or nicer.

Ideally we could do (1), but barring that, I'm leaning towards (2) mostly because I can't think of (3)

If we go with (2) we've to somehow configure libnss-ldapd to not try to install nslcd itself but to just talk to the appropriate socket.

The data presented by nslcd is identical to all hosts, so exploring (2) seems best to me.

After more discussion with @MoritzMuehlenhoff the options for following (2) are:

  1. Patch libnss-ldapd source package to build a libnss-ldapd-plain binary package, and remember to keep forward porting this all new releases. This can just go into one of our deb repos.
  2. Make a custom local build of libnss-ldapd without the nslcd dependency, and just dpkg -i it in the container. This is simpler than (1) but still complex.
  3. Just install libnss-ldapd as is in all containers. nslcd will be installed but won't start anyway. We'll pay disk cost of the nslcd binary, but that isn't probably much

(3) is the one with the least amount of custom long term sustainining effort needed from us, so I think we should try to do that :D

Change 288464 had a related patch set uploaded (by Yuvipanda):
Add toollabs base container

https://gerrit.wikimedia.org/r/288464

Change 288464 merged by Yuvipanda:
Add toollabs base container

https://gerrit.wikimedia.org/r/288464

Change 288761 had a related patch set uploaded (by Yuvipanda):
tools: Enable host automounts

https://gerrit.wikimedia.org/r/288761

Change 288762 had a related patch set uploaded (by Yuvipanda):
k8s: Actually enable host automounter

https://gerrit.wikimedia.org/r/288762

Change 288762 abandoned by Yuvipanda:
k8s: Actually enable host automounter

https://gerrit.wikimedia.org/r/288762

We have a fairly decent solution for this now. We've setup libnss-ldapd, and nslcd won't start by default because we've suppressed autostart of packages anyway. We've written a k8s admission controller that allows us to automatically mount all containers with specific paths from the host, and configured it to automount /var/run/nslcd/socket. This works fine now for all containers building off of docker-registry.tools.wmflabs.org/jessie-toollabs.

Need to figure out if we need nscd.

We do need nscd, otherwise it is too slow :(

Change 288761 merged by Yuvipanda:
tools: Enable host automounts

https://gerrit.wikimedia.org/r/288761

It seems fast enough without nscd caching. We can reopen if we get actual complaints about it being slow.