Page MenuHomePhabricator

Labs NFSv4/idmapd mess
Closed, ResolvedPublic

Description

I've been looking at the situation surrounding NFSv4 idmapping in Labs, because of the effects of T78076 and explanations that did not make sense to me.

It looks like the current situation is severly buggy and very messy due to multiple reasons:

  • idmapd in the kernel is enabled by default in kernels < 3.3 and disabled in kernels >= 3.3 (cf. Linux upstream commit). This means that the behavior across precise & trusty (and jessie) is different, which in turns means that half of the Labs fleet does idmapping and the other does not(!)
  • Moreover, there used to be an upstart script in our repository that explicitly disabled idmapping but this is not applied anymore. Confusingly enough, the upstart script was left over in our repository, although referenced from nowhere (I killed this). This seems to mostly be a pmtpa zone thing according to git log; however, a salt run across Labs revealed a single eqiad instance having this upstart script in place.
  • Even if that upstart script was applied, it would be broken anyway, as that sysfs setting needs to be applied before any mounts happen; a better strategy would have been to echo options nfs nfs4_disable_idmapping=1 > /etc/modprobe.d/nfs-disable-idmap, which is simpler to code anyway.
  • Even though the intention seems to be to have idmapping disabled, labstore1001 includes ldap::role::client::labs, which additionally breaks prod logins to the server, making it a one of a kind across our (prod) fleet. Moreover, it has the incredibly ugly User['apache'] definition. This seems consistent with solving idmapping issues, although the intention was probably the opposite.

This is obviously a very messy situation. I propose the following (and I'd like the Labs team, probably @coren, to implement it, if there's agreement):

  • Disable idmapping everywhere. It's fully supported by RFCs nowadays. Not disabling it would mean that we'd have to instantiate all potential users across our puppet tree into labstore which would be... gross. I suggest doing this by instantiating a modprobe.d file (as above) under an os_version('ubuntu < trusty') guard; note that this will need a reboot of the affected instances for both the kernel option to be set and the mounts to be redone.
  • Clean up all possible remnants of that upstart script.
  • Remove all the Labs LDAP stanzas (and User/Group apache) from labstore1001. This is currently a liability as labstore1001 despite being a production host has a different security model than the rest of production.
  • Enforce uid/gids in our User/Group stanzas across all of our puppet tree. This is a good idea for production anyway, so it's not a Labs-only requirement.

Event Timeline

faidon assigned this task to coren.
faidon raised the priority of this task from to High.
faidon updated the task description. (Show Details)
faidon added a project: Cloud-Services.
faidon added subscribers: faidon, coren.

I see two big problems with this - implementation issues rather than a conceptual ones:

(1) right now the numbering space between production and labs is disjoint, and possibly overlapping (I think it is, but mostly for things that matter very little).

Because Labs allocates a large number of uid and gids, we either need to explicitly set ranges aside that are verbotten in production (and ensure that all currently extant users and groups fall inside of those ranges) or promote the Labs numbering as authoritative over prod (which is, clearly, an issue).

(2) some system users are managed by dpkg and not puppet; this means that the numbering of those users and groups is instance-local and unpredictable.

I'm going to be investigating what can be done about latter problem - the former is relatively easy, just lots of work.

After investigation and testing with labstore2001, disabling idmap entirely will be approximately a noop but will require a reboot and restart of labstore. (I plan on using the restart required by the shelves being moved this week to perform it).

It's only an approximate noop because the managed-by-apt user issue that remains a potential problem, but in practice system users are not writing to NFS at this time so the net effect will be null. It will be a caveat to make a note of for the future as I can find no reliable way to ensure that system-managed UID match accross the park).

Despite the success with labstore2001, the instabilities of the past three weeks have made me weary of making a change of this magnitude on labstore1001. I would suggest waiting for a week or two with no significant Labs outage before proceeding?

OK, waiting a week has been prudent. Perhaps we should move this forward again ?

I'm going to place this on our "sprint" for next week (Mar 30)

coren set Security to None.
coren moved this task from Backlog to Doing on the Labs-Q4-Sprint-1 board.

So, after some testing, I believe I found an order of operation which doesn't break too many things and allows us to phase out idmap entirely:

  • make certain that any file owner in the shared NFS filesystem is owned by a user that is in LDAP or invariant across servers (root being the obvious one)
  • disable idmapping with the module option and in /etc/init in puppet (the pmtpa setting used to work because it was concurrent with autofs which by necessity preceded any NFS mount; this no longer is in use)
  • restart precise instances with the new settings
  • At this point it is safe to turn off idmap on the server but not remove LDAP
  • review and adapt code running on labstore that rely on usernames and project groups (mostly for database credentials) to query LDAP directly and not get*ent()
  • disable LDAP client on labstores
  • enable the admin class on labstores

This was tested as working with precise instances against labstore1001 for existing instances, and by pointing a new precise instance at a fake share on labstore1003 (which doesn't have LDAP). Trusty instances have been tested to not alter behaviour when idmap is turned off on the target server (they aren't using idmap), and Jessie is likewise unaffected.

It was tested to be possible to disable idmap on running instances, provided every NFS filesystem can first be unmounted, but extensive testing shows that - in practice - this is nearly impossible to do for about 90% of instances because any process holding a file open (including cwd) on /any/ NFS mountpoint suffices to prevent reloading the nfs kernel module.

Tool Labs can be managed with minimal downtime as it is possible to drain usage away from instances running user code before rebooting it; staggering restarts can allow the entire process to be done with little to no visible downtime.

Will create tasks for all the steps shortly.

This order now demonstrably works with Precise, with both the unmount-everything and the just-reboot methods allowing NFS-by-uid.

Next up is schedule a restart of the NFS on all extant precise instances (I'm going to do as many of them with the unmount dance first)

coren moved this task from Backlog to Doing on the Labs-Q4-Sprint-4 board.
coren moved this task from Doing to Code Review / Blocked on the Labs-Q4-Sprint-4 board.

The solution I thought of was using the internal glibc function __nss_configure_lookup to explicitly configure LDAP for mountd (while the rest of the system will have an nsswitch.conf that does not include LDAP at all).

This code:

#include <nss.h>

static void __attribute__((constructor)) nssldapwrap(void) {
	__nss_configure_lookup("passwd", "files ldap");
	__nss_configure_lookup("group", "files ldap");
}

("files" should be only needed for the root user). The above works to build an LD_PRELOAD wrapper, but we could just as easily write a proper wrapper that configures look ups and then execve()s mountd.

To integrate this with the rest we'll need:

  • Building a wrapper with the above code as a Debian package. That package can ship a dpkg-divert in the postinst that would divert rpc.mountd to rpc.mountd.real and ship the wrapper as rpc.mountd. This would make init scripts and such for NFS work as normal and make this whole tricker transparent to the rest of the system.
  • Proper puppetization for the other components (factored out out of the mess that LDAP puppet classes are). We'll need the whole NSS LDAP config like we would normally do, only this time with the exception of not needing an LDAPized nsswitch.conf.

I don't think I can invest the time for the above at the moment but it should be fairly straightforward. I'll let the Labs team pick it up and I'll be here if any help is needed.

It's high on my radar but Mark asked me to priorize other things (I had just brought it up this week for the sprint). I'll still try to squeeze it in.

There is now a debian package candidate with the required thunk/shim. The code works, but it's likely to packaging itself is insane and needs review.

https://gerrit.wikimedia.org/r/#/admin/projects/operations/debs/nfsd-ldap

The package is built and now lives in the Wikimedia repo. Testing on labstore2001 shows that nfsd succesfully starts with the shim invoked and no visible issues as expected, but not all aspects could be tested with certainty since the box no longer has all the right filesystems.

That said, this gives me good confidence for Monday's switch since - at the very least - nfsd works when started that way and continues to poke LDAP even if nsswitch.conf was temporarily fixed to not do so.

NFS was restarted on labstore1001 and its memory map confirms that the LDAP shim is in place:

7f4c9b5f4000-7f4c9b5f5000 r-xp 00000000 09:00 53870988                   /usr/lib/nfsd-ldap/useldap.so
7f4c9b5f5000-7f4c9b7f4000 ---p 00001000 09:00 53870988                   /usr/lib/nfsd-ldap/useldap.so
7f4c9b7f4000-7f4c9b7f5000 rw-p 00000000 09:00 53870988                   /usr/lib/nfsd-ldap/useldap.so

Last steps are:

  • removing the systemwide LDAP config (keeping just the .conf for the shim)
  • Finally turning base::admin on for the labstores

Change 254881 had a related patch set uploaded (by coren):
Labs: Have fileservers no longer nsswitch to LDAP

https://gerrit.wikimedia.org/r/254881

Change 254881 merged by coren:
Labs: Have fileservers no longer nsswitch to LDAP

https://gerrit.wikimedia.org/r/254881

Change 254885 had a related patch set uploaded (by coren):
Fix nsswitch_use_default to a string

https://gerrit.wikimedia.org/r/254885

Change 254885 merged by coren:
Fix nsswitch_use_default to a string

https://gerrit.wikimedia.org/r/254885

Change 254941 had a related patch set uploaded (by coren):
Fix the script not passing arguments to the binary

https://gerrit.wikimedia.org/r/254941

Change 254941 merged by coren:
Fix the script not passing arguments to the binary

https://gerrit.wikimedia.org/r/254941

labstore* NFS daemons now use the ldap shim and grab group membership information from LDAP regardless of the systemwide setting.

Left: turn off LDAP in nsswitch.conf (there remains a hiera issue) and enable base::admin

Change 255118 had a related patch set uploaded (by coren):
Labs: remove has_admin override for labs::nfs::fileserver

https://gerrit.wikimedia.org/r/255118

Change 255118 merged by coren:
Labs: remove has_admin override for labs::nfs::fileserver

https://gerrit.wikimedia.org/r/255118

Change 255119 had a related patch set uploaded (by coren):
Labs: remove labstore special case from enforce-users-groups

https://gerrit.wikimedia.org/r/255119

Change 255119 merged by coren:
Labs: remove labstore special case from enforce-users-groups

https://gerrit.wikimedia.org/r/255119

This is done; labstore* hosts are now normal prod servers with base::admin.