While migrating tools-worker-* nodes to using sssd, I've detected that the sssd version in Jessie doesn't support the config we use for stretch. The config diff seems to be minimal, but enough to create problems.
|operations/docker-images/toollabs-images : master||docker: add support for "testing" tags|
|operations/puppet : production||sssd: include the /etc/ldap.yaml file|
|operations/puppet : production||ldap client: sssd: introduce jessie-specific bits in sssd.conf|
- Mentioned In
- rODIT757804fd2555: docker: add support for "testing" tags
T229058: Replace the nslcd mount in containers from the old Toolforge cluster with something that will work with sssd in the new one
T224651: Manual update - stale file handle
- Mentioned Here
- T224651: Manual update - stale file handle
Issues continue despite the patch https://gerrit.wikimedia.org/r/513091.
I have now 2 more clues:
- the /etc/ldap.yaml which is used by the ssh key lookup tool is a directory instead of a regular file in a freshly created jessie VM:
root@tools-worker-1002:~# /usr/sbin/ssh-key-ldap-lookup aborrero Traceback (most recent call last): File "/usr/sbin/ssh-key-ldap-lookup", line 138, in <module> main() File "/usr/sbin/ssh-key-ldap-lookup", line 114, in main with open('/etc/ldap.yaml') as f: IOError: [Errno 21] Is a directory: '/etc/ldap.yaml' root@tools-worker-1002:~# file /etc/ldap.yaml /etc/ldap.yaml: directory
That prevents normal SSH from any user.
- there seems to be some missing PAM package. There should be differences in how package dependencies are setup between jessie/stretch.
May 30 11:29:46 tools-worker-1003 sshd: PAM unable to dlopen(pam_ldap.so): /lib/security/pam_ldap.so: cannot open shared object file: No such file or directory
which is why after changing to sssd/sudo nothing works, including our virsh console trick.
The pam issue may require a pam-auth-update --force --package run in the server, because there are stale entries in the pam config pointing to pam_ldap.so, which we don't use anymore after switching to sssd.
current status by the time of this comment: The only toolforge Debian Jessie server that is running sssd is tools-worker-1029, that was created explicitly for testing it.
The rest of the Jessie systems aren't running sssd, but the classic stack (nslcd/nscd/sudoldap):
tools-elastic-01.tools.eqiad.wmflabs tools-elastic-02.tools.eqiad.wmflabs tools-elastic-03.tools.eqiad.wmflabs tools-flannel-etcd-01.tools.eqiad.wmflabs tools-flannel-etcd-02.tools.eqiad.wmflabs tools-flannel-etcd-03.tools.eqiad.wmflabs tools-k8s-etcd-01.tools.eqiad.wmflabs tools-k8s-etcd-02.tools.eqiad.wmflabs tools-k8s-etcd-03.tools.eqiad.wmflabs tools-k8s-master-01.tools.eqiad.wmflabs tools-prometheus-01.tools.eqiad.wmflabs tools-prometheus-02.tools.eqiad.wmflabs tools-proxy-03.tools.eqiad.wmflabs tools-proxy-04.tools.eqiad.wmflabs tools-puppetmaster-01.tools.eqiad.wmflabs tools-redis-1001.tools.eqiad.wmflabs tools-redis-1002.tools.eqiad.wmflabs tools-worker-1001.tools.eqiad.wmflabs tools-worker-1002.tools.eqiad.wmflabs tools-worker-1003.tools.eqiad.wmflabs tools-worker-1004.tools.eqiad.wmflabs tools-worker-1005.tools.eqiad.wmflabs tools-worker-1006.tools.eqiad.wmflabs tools-worker-1007.tools.eqiad.wmflabs tools-worker-1008.tools.eqiad.wmflabs tools-worker-1009.tools.eqiad.wmflabs tools-worker-1010.tools.eqiad.wmflabs tools-worker-1011.tools.eqiad.wmflabs tools-worker-1012.tools.eqiad.wmflabs tools-worker-1013.tools.eqiad.wmflabs tools-worker-1014.tools.eqiad.wmflabs tools-worker-1015.tools.eqiad.wmflabs tools-worker-1016.tools.eqiad.wmflabs tools-worker-1017.tools.eqiad.wmflabs tools-worker-1018.tools.eqiad.wmflabs tools-worker-1019.tools.eqiad.wmflabs tools-worker-1020.tools.eqiad.wmflabs tools-worker-1021.tools.eqiad.wmflabs tools-worker-1022.tools.eqiad.wmflabs tools-worker-1023.tools.eqiad.wmflabs tools-worker-1025.tools.eqiad.wmflabs tools-worker-1026.tools.eqiad.wmflabs tools-worker-1027.tools.eqiad.wmflabs tools-worker-1028.tools.eqiad.wmflabs
Lowering the priority of this ticket, since this involves more work that originally thought.