I can not ssh anymore to integration-agent-docker-1057.integration.eqiad1.wikimedia.cloud and get a permission denied.
I can still connect to it via Cumin from integration-cumin.integration.eqiad.wmflabs since it its key is added un /etc/ssh/userkeys/root.d which makes me suspect an issue with LDAP.
When using Cumin to trigger a puppet run with sudo cumin --force 'name:agent-docker-1057' 'puppet agent -tv':
Notice: /Stage[main]/Profile::Ci::Docker/Exec[jenkins user docker membership]/returns: usermod: user 'jenkins-deploy' does not exist Error: '/usr/sbin/usermod -aG docker 'jenkins-deploy'' returned 6 instead of one of [0] Error: /Stage[main]/Profile::Ci::Docker/Exec[jenkins user docker membership]/returns: change from 'notrun' to ['0'] failed: '/usr/sbin/usermod -aG docker 'jenkins-deploy'' returned 6 instead of one of [0] (corrective) Error: Could not find user jenkins-deploy Error: /Stage[main]/Profile::Ci::Slave::Labs::Common/File[/srv/jenkins]/owner: change from 2947 to 'jenkins-deploy' failed: Could not find user jenkins-deploy Error: Could not find group wikidev Error: /Stage[main]/Profile::Ci::Slave::Labs::Common/File[/srv/jenkins]/group: change from 500 to 'wikidev' failed: Could not find group wikidev Notice: /Stage[main]/Profile::Ci::Slave::Labs::Common/File[/srv/jenkins/cache]: Dependency File[/srv/jenkins] has failures: true ... Error: Could not find user jenkins-deploy Error: /Stage[main]/Profile::Ci::Slave::Labs::Common/File[/srv/home/jenkins-deploy]/owner: change from 2947 to 'jenkins-deploy' failed: Could not find user jenkins-deploy Error: Could not find group wikidev Error: /Stage[main]/Profile::Ci::Slave::Labs::Common/File[/srv/home/jenkins-deploy]/group: change from 500 to 'wikidev' failed: Could not find group wikidev
The jenkins-deploy user is defined in LDAP https://ldap.toolforge.org/user/jenkins-deploy with uid 2947.
So I guess the LDAP configuration is broken on that instance somehow or it can't reach LDAP?