Page MenuHomePhabricator

Instances broken on initial provision with dns setup issues
Closed, ResolvedPublic

Description

Example at https://wikitech.wikimedia.org/wiki/Nova_Resource:Nlwiki.wikitextexp.eqiad.wmflabs

Initial Puppet provisioning had this error:

2016-02-11T04:00:52.552212+00:00 nlwiki rc.local[361]: Error: Could not retrieve catalog from remote server: Error 400 on SERVER: Reading data from Wikitextexp failed: TypeError: Data retrieved from Wikitextexp is String not Hash at /etc/puppet/manifests/realm.pp:92 on node nlwiki.wikitextexp.eqiad.wmflabs

The line in realm.pp is:

$nameservers = [ ipresolve(hiera('labs_recursor'),4), ipresolve(hiera('labs_recursor_secondary'),4) ]

Non-root users are not able to ssh into the instance and the console logs are filled with:

2016-02-11T04:15:20.800181+00:00 nlwiki salt-minion[719]: [ERROR ] DNS lookup of 'labs-puppetmaster-codfw.wikimedia.org' failed.
2016-02-11T04:15:20.800481+00:00 nlwiki salt-minion[719]: [ERROR ] Master hostname: 'labs-puppetmaster-codfw.wikimedia.org' not found. Retrying in 30 seconds

Event Timeline

bd808 created this task.Feb 11 2016, 4:40 AM
bd808 raised the priority of this task from to Needs Triage.
bd808 updated the task description. (Show Details)
bd808 added a project: Cloud-Services.
bd808 added subscribers: bd808, ssastry.
Restricted Application added subscribers: StudiesWorld, Aklapper. · View Herald TranscriptFeb 11 2016, 4:40 AM
ssastry added a comment.EditedFeb 11 2016, 2:06 PM

I deleted this instance and spun up a new one. The new one failed with the same permission denied issue.

Change 271019 had a related patch set uploaded (by Andrew Bogott):
Don't configure a secondary labs salt master on new labs instances.

https://gerrit.wikimedia.org/r/271019

Change 271019 merged by Andrew Bogott:
Don't configure a secondary labs salt master on new labs instances.

https://gerrit.wikimedia.org/r/271019

chasemp triaged this task as Medium priority.Apr 4 2016, 2:35 PM

Given the change that Andrew merged back in February, they should no longer be attempting to connect to such a host. Does this issue still occur? If not, let's close?

AlexMonk-WMF closed this task as Resolved.Aug 4 2016, 12:05 AM

Assuming this was fixed then.