Initial description for reference:
It was reported by Magnus___ on #wkimedia-cloud that the tool home dir /data/project/genedb are not automatically created for the tool genedb. The directory was then created manually.
The directory is supposed to be created by maintain-kubeusers on tools-k8s-master-01, which, besides creating the home directory, also allocates and gives k8s creds to the user, in a directory named .kube inside the directory. The .kube is missing:
root@tools-bastion-05:~# ls -al /data/project/genedb total 160 drwxr-xr-x 4 tools.genedb tools.genedb 4096 May 16 15:22 . drwxr-xr-x 2046 root root 69632 May 16 14:55 .. -rw-r--r-- 1 tools.genedb tools.genedb 53639 May 16 20:52 access.log -rw------- 1 tools.genedb tools.genedb 906 May 16 15:22 .bash_history -rw-r--r-- 1 tools.genedb tools.genedb 49 May 16 14:58 error.log drwxrwxr-x 2 tools.genedb tools.genedb 4096 May 16 15:05 public_html -r-------- 1 tools.genedb tools.genedb 52 May 16 14:55 replica.my.cnf drwxr-xr-x 2 tools.genedb tools.genedb 4096 May 16 15:00 scripts -rw-r--r-- 1 tools.genedb tools.genedb 137 May 16 14:58 service.manifest
systemctl says maintain-kubeusers is running, but it generates no logs at all:
root@tools-k8s-master-01:~# systemctl status maintain-kubeusers ● maintain-kubeusers.service - "Create & Maintain kubernetes tool & infrastructure users" Loaded: loaded (/lib/systemd/system/maintain-kubeusers.service; enabled) Active: active (running) since Sat 2018-05-12 02:46:34 UTC; 4 days ago Main PID: 14152 (maintain-kubeus) CGroup: /system.slice/maintain-kubeusers.service └─14152 /usr/bin/python3 /usr/local/bin/maintain-kubeusers --infrastructure-users /etc/kubernetes/infrastructure-users --project ... Warning: Journal has been rotated since unit was started. Log output is incomplete or unavailable. root@tools-k8s-master-01:~# journalctl -xu maintain-kubeusers -- Logs begin at Wed 2018-05-16 13:50:53 UTC, end at Wed 2018-05-16 21:00:06 UTC. --
strace shows that it's probably stuck in infinite sleeps of 10 seconds:
root@tools-k8s-master-01:~# strace -p 14152 Process 14152 attached select(0, NULL, NULL, NULL, {8, 415880}) = 0 (Timeout) select(0, NULL, NULL, NULL, {10, 0}) = 0 (Timeout) select(0, NULL, NULL, NULL, {10, 0}) = 0 (Timeout) select(0, NULL, NULL, NULL, {10, 0}^CProcess 14152 detached <detached ...>
Current notes: We've since found that the maintain-kubeusers script doesn't fail hard enough when LDAP is unavailable and will eventually keep running without any access to LDAP after a few tries. This naturally doesn't work. It needs to maintain connections more intelligently or fail harder.