Page MenuHomePhabricator

evaluate possibility for nscd use with useldap
Open, NormalPublic

Description

Our nfsd setup has a special override that makes rpc.mountd query LDAP instead of the system default.

dpkg -L nfs-kernel-server
diverted by nfsd-ldap to: /usr/sbin/rpc.mountd.real

/usr/sbin/rpc.mountd

LD_PRELOAD=/usr/lib/nfsd-ldap/useldap.so exec /usr/sbin/rpc.mountd.real "$@"

This user would not normally exist:

root@labstore1001:~# LD_PRELOAD=/usr/lib/nfsd-ldap/useldap.so getent passwd 11514
angel1:*:11514:500:Angel1:/home/angel1:/bin/bash

root@labstore1001:~# getent passwd 11514; echo $?
2

The preload lib has a flag that omits the use of nscd:

useldap.c

 // The nscd daemon may operate under different rules, and it is
// therefore important that it not be used by the executable.  There

...

static void __attribute__((constructor)) _do_useldap(void)
{
    __nss_configure_lookup("passwd", "ldap files");
    __nss_configure_lookup("group", "ldap files");
    __nss_disable_nscd(_faux_register_file);
}

nscd caching as an effect is very low:

passwd
        258  cache hits on positive entries
          1  cache hits on negative entries

group cache:

267  cache hits on positive entries
  8  cache hits on negative entries

So while on a random VM within labs sees:

root@nfs-server:~# time getent passwd 11514
angel1:x:11514:500:Angel1:/home/angel1:/bin/bash

real	0m0.001s

A user lookup (after initial cache I imagine though that was .005s) hovers between .001 and .003 of a second.

On labstore it can be as low as .006 or .007 of a second but is not uncommon to be higher I have been watching it for awhile and I have seen it hover sometimes around a second per lookup, and the first time I was poking at this I had several second lookups. That seems like an uncommon situation but I would like to explore why using nscd cache for useldap.so is harmful, and whether another equivalent strategy is available.

We do have monitoring looking for this mechanism to take longer than 1s. I'm not sure of the volume of queries that the NFSD server may be making and what the impact of a query time per request like that would be

https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=labstore1001&service=Getent+speed+check

Event Timeline

chasemp created this task.Jan 27 2016, 10:34 PM
chasemp updated the task description. (Show Details)
chasemp raised the priority of this task from to Normal.
chasemp added a project: Cloud-Services.
chasemp added a subscriber: chasemp.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJan 27 2016, 10:34 PM
mark set Security to None.
mark added a subscriber: mark.
mark added a subscriber: faidon.
bd808 moved this task from Backlog to Shared Storage on the Data-Services board.Jul 24 2017, 12:40 AM