Page MenuHomePhabricator

Creation of instances broken
Closed, ResolvedPublic

Description

After a replace of old instances, it is not possible to log in in new created instances. The connection gets a timeout, or Permission denied (publickey). (trusty)
See also the diagramm: https://tools.wmflabs.org/nagf/?project=rcm
Maybe this could be a reason:

2015-09-20T12:45:03.442418+00:00 rcm-2 rc.local[371]: #033[1;31mError: Could not request certificate: Connection refused - connect(2) for "" port 8140#033[0m
2015-09-20T12:45:03.442703+00:00 rcm-2 puppet-agent[681]: Could not request certificate: Connection refused - connect(2) for "" port 8140
2015-09-20T12:45:13.443805+00:00 rcm-2 rc.local[371]: #033[1;31mError: Could not request certificate: Connection refused - connect(2) for "" port 8140#033[0m
2015-09-20T12:45:13.444185+00:00 rcm-2 puppet-agent[681]: Could not request certificate: Connection refused - connect(2) for "" port 8140
2015-09-20T12:45:23.445928+00:00 rcm-2 rc.local[371]: #033[1;31mError: Could not request certificate: Connection refused - connect(2) for "" port 8140#033[0m
2015-09-20T12:45:23.446245+00:00 rcm-2 puppet-agent[681]: Could not request certificate: Connection refused - connect(2) for "" port 8140
2015-09-20T12:45:28.164958+00:00 rcm-2 puppet-agent[631]: Could not request certificate: Connection refused - connect(2) for "" port 8140
2015-09-20T12:45:29.368466+00:00 rcm-2 salt-minion[654]: [ERROR   ] This master address: 'None' was previously resolvable but now fails to resolve! The previously resolved ip addr will continue to be used
2015-09-20T12:45:29.369155+00:00 rcm-2 salt-minion[654]: [WARNING ] Master hostname: None not found. Retrying in 30 seconds
2015-09-20T12:45:33.447348+00:00 rcm-2 rc.local[371]: #033[1;31mError: Could not request certificate: Connection refused - connect(2) for "" port 8140#033[0m
2015-09-20T12:45:33.447684+00:00 rcm-2 puppet-agent[681]: Could not request certificate: Connection refused - connect(2) for "" port 8140

This was the console output of rcm-2, the console repeat this messages like in a loop. A reboot did not change this. This was a jessie instance.

Event Timeline

Luke081515 raised the priority of this task from to Unbreak Now!.
Luke081515 updated the task description. (Show Details)
Luke081515 added a project: Cloud-Services.
Luke081515 subscribed.

This happens also with jessie and presice instances, for example here the console output of precise:

[1;35merr: Could not request certificate: Connection refused - connect(2)[0m
[1;35merr: Could not request certificate: Connection refused - connect(2)[0m
[1;35merr: Could not request certificate: Connection refused - connect(2)[0m
[1;35merr: Could not request certificate: Connection refused - connect(2)[0m
[1;35merr: Could not request certificate: Connection refused - connect(2)[0m
[1;35merr: Could not request certificate: Connection refused - connect(2)[0m
Sep 21 11:30:44  puppet-agent[1367]: last message repeated 4 times
Sep 21 11:30:44 rcm-1 lldpd[853]: asroot_gethostbyname: [priv]: unable to get system  name
[1;35merr: Could not request certificate: Connection refused - connect(2)[0m
Sep 21 11:30:50 rcm-1 puppet-agent[1367]: Could not request certificate: Connection refused - connect(2)
[1;35merr: Could not request certificate: Connection refused - connect(2)[0m
[1;35merr: Could not request certificate: Connection refused - connect(2)[0m
Sep 21 11:31:14  puppet-agent[1367]: last message repeated 2 times
Sep 21 11:31:14 rcm-1 lldpd[853]: asroot_gethostbyname: [priv]: unable to get system name
[1;35merr: Could not request certificate: Connection refused - connect(2)[0m
Sep 21 11:31:20 rcm-1 puppet-agent[1367]: Could not request certificate: Connection refused - connect(2)
[1;35merr: Could not request certificate: Connection refused - connect(2)[0m
[1;35merr: Could not request certificate: Connection refused - connect(2)[0m
Sep 21 11:31:44  puppet-agent[1367]: last message repeated 2 times
Sep 21 11:31:44 rcm-1 lldpd[853]: asroot_gethostbyname: [priv]: unable to get system name
[1;35merr: Could not request certificate: Connection refused - connect(2)[0m
Sep 21 11:31:50 rcm-1 puppet-agent[1367]: Could not request certificate: Connection refused - connect(2)
[1;35merr: Could not request certificate: Connection refused - connect(2)[0m

The new instances have the same names as recently-deleted instances, yes? If so, I'd advise deleting the original instances and then waiting several minutes (maybe 10-15) before recreating the new ones. There are some races in the code which are made worse by wikitech declaring victory well before the instances are actually deleted.

Ok, the instances are deleted now, I will recreate them tomorrow. Thanks for that tip.

Luke081515 claimed this task.

Works now.

Restricted Application added subscribers: Jay8g, TerraCodes. · View Herald Transcript