integration-slave-jessie-1001.integration.eqiad.wmflabs cant be reached via ssh although it is working and reacheable via salt :(
Description
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | yuvipanda | T105720 Labs team reliability goal for Q1 2015/16 | |||
Resolved | Andrew | T102240 Audit projects' use of NFS, and remove it where not necessary | |||
Resolved | hashar | T90610 Continuous integration should not depend on labs NFS | |||
Resolved | hashar | T103312 Cant ssh to integration-slave-jessie-1001.integration.eqiad.wmflabs |
Event Timeline
Remove the puppet class role::ci::slave::labs which prevents puppet from completing.
Under /home/ only /home/admin/ exists :(
Without role::ci::slave::labs puppet ran just fine. I verified resolv.conf / puppet.conf etc via salt commands. The only reason I can see for ssh to fail is because the user homedirs are not created under /home/.
Seems like a lab / sshd configuration issue to me.
From /var/log/auth.log :
Jun 23 20:36:22 integration-slave-jessie-1001 sshd[6109]: Connection from 10.68.16.210 port 45790 on 10.68.16.72 port 22 Jun 23 20:36:22 integration-slave-jessie-1001 sshd[6109]: Connection closed by 10.68.16.210 [preauth]
Well that is Shinken.
So bastion-01.bastion.eqiad.wmflabs has the IP 10.68.17.232 but it is not enabled in the ferm rules! Thus the Jessie instance reject it, although I am not sure why it has ferm rules enabled and other CI slaves do not :-/
Stopped ferm service via salt and I can ssh again.
Purged the ferm package, removed /etc/ferm and reran puppet. ferm is no more, though I am pretty sure it used to be around.
Hmm, the bastion-01 IP was added to puppet when it was created, are you sure these were running up to date puppet?
For some reason ferm is no more applied on the CI instances, so it could not receive the new rules update.