Page MenuHomePhabricator

puppet function ipresolve unable to look up instance on labs-puppetmaster
Closed, InvalidPublic

Description

I'm getting this error from puppet on all phabricator project instances when they call ipresolve in modules/scap/manifests/target.pp:98

Error: Could not retrieve catalog from remote server:
Error 400 on SERVER: DNS lookup failed for phab-tin.phabricator.eqiad.wmflabs
Resolv::DNS::Resource::IN::A
at /etc/puppet/modules/scap/manifests/target.pp:98 on node phab-01.phabricator.eqiad.wmflabs

@chasemp did some digging and found out that it's not just puppet, it's a systemic dns resolution failure on the labs puppetmaster:

<chasemp> root@labcontrol1001:~# host phab-tin.phabricator.eqiad.wmflabs
<chasemp> Host phab-tin.phabricator.eqiad.wmflabs not found: 3(NXDOMAIN)

Event Timeline

Restricted Application added subscribers: Zppix, TerraCodes, Aklapper. · View Herald Transcript

@chasemp, can you check other hosts that we know work like bastion-01.bastion.eqiad.wmflabs?

Paladox triaged this task as High priority.Jun 30 2016, 6:21 PM
Paladox added subscribers: Dzahn, Paladox.
This comment was removed by Paladox.

@chasemp, can you check other hosts that we know work like bastion-01.bastion.eqiad.wmflabs?

same resullt. I'm reasonably sure this has never worked in this fashion and instances of resolve() for labs hosts have historically hung off of in-project masters. Why it is this way I"m not sure, it could be oversight or explicit for reasons I can't imagine atm. I'm going to ask andrew next week but this isn't in the path of anyone for now as I believe mukunda worked around it.

@chasemp, can you check other hosts that we know work like bastion-01.bastion.eqiad.wmflabs?

Why it is this way I"m not sure, it could be oversight or explicit for reasons I can't imagine atm.

What DNS server does labcontrol1001 have in /etc/resolv.conf?

Might be related: the CI instances do have random DNS resolution failure, then they have a single resolver (208.80.155.118 labs-recursor0) T137460

In puppet we have modules/openstack/templates/liberty/nova/dnsmasq-nova.conf.erb :

#Clients should use the designate-backed dns server rather than dnsmasq
dhcp-option=option:dns-server,<%= @recursor_ip %>

Potentially if @recursor_ip is found out to be list/array, we could sort and join with , all the entries.

You need to pass $::nameservers[0] to ipresolve to have them work in labs.

yuvipanda@fearless:~/code/puppet$ git grep ipresolv | grep nameserver

Because otherwise it hits the DNS server in /etc/resolv.conf which is prod DNS which doesn't know about labs instances...

I too agree this is terrible.

Can we have production DNS delegate the wmflabs/testlabs TLD and reverse zones to labs/labtest DNS servers, like what's done for corp?