Page MenuHomePhabricator

Can't ssh to oxygen.rcm.eqiad.wmflabs
Closed, ResolvedPublic

Description

I'm not able to ssh into the instance. Shh says, I need to enter a password. Other instances are ok.

The log is full with:

[0m[31m* [0m] A start job is running for /etc/rc.local Compatibili... 4s / no limit)[K[   [31m*[1;31m*[0m[31m*[0m] A start job is running for /etc/rc.local Compatibili... 4s / no limit)2016-11-01T20:02:50.078275+00:00 oxygen diamond[541]: sudo: ldap_start_tls_s(): Can't contact LDAP server
2016-11-01T20:02:50.080232+00:00 oxygen diamond[541]: sudo: unable to resolve host oxygen
2016-11-01T20:02:50.080379+00:00 oxygen nslcd[643]: [ed7263] <group/member="puppet"> ldap_start_tls_s() failed (uri=ldap://ldap-labs.eqiad.wikimedia.org:389): Can't contact LDAP server
2016-11-01T20:02:50.080544+00:00 oxygen nslcd[643]: [ed7263] <group/member="puppet"> failed to bind to LDAP server ldap://ldap-labs.eqiad.wikimedia.org:389: Can't contact LDAP server
2016-11-01T20:02:50.080688+00:00 oxygen nslcd[643]: [ed7263] <group/member="puppet"> ldap_start_tls_s() failed (uri=ldap://ldap-labs.codfw.wikimedia.org:389): Can't contact LDAP server
2016-11-01T20:02:50.080826+00:00 oxygen nslcd[643]: [ed7263] <group/member="puppet"> failed to bind to LDAP server ldap://ldap-labs.codfw.wikimedia.org:389: Can't contact LDAP server
2016-11-01T20:02:50.080961+00:00 oxygen nslcd[643]: [ed7263] <group/member="puppet"> no available LDAP server found: Can't contact LDAP server
2016-11-01T20:02:50.081096+00:00 oxygen nslcd[643]: [ed7263] <group/member="puppet"> no available LDAP server found: Server is unavailable
2016-11-01T20:02:50.091554+00:00 oxygen nslcd[643]: [dcc233] <group/member="Debian-exim"> no available LDAP server found: Server is unavailable: Resource temporarily unavailable
2016-11-01T20:02:50.091736+00:00 oxygen nslcd[643]: [dcc233] <group/member="Debian-exim"> no available LDAP server found: Server is unavailable: Resource temporarily unavailable
2016-11-01T20:02:50.104893+00:00 oxygen diamond[541]: sudo: ldap_start_tls_s(): Can't contact LDAP server
2016-11-01T20:02:50.108226+00:00 oxygen diamond[541]: sudo: unable to resolve host oxygen

and

2016-11-01T20:03:24.802319+00:00 oxygen puppet-agent[1006]: Could not request certificate: Connection refused - connect(2) for "" port 8140

Event Timeline

@yuvipanda fixed it actually, so I can login now.

yuvipanda claimed this task.

Somehow the instance's /etc/resolv.conf got to:

root@oxygen:~# cat /etc/resolv.conf 
domain rcm.
search rcm. 
nameserver 
options timeout:5 ndots:2

and failed. I fixed it by hand and ran puppet, and things were fine.
spooooookyyyy!