Page MenuHomePhabricator

dhclient overwrites /etc/resolv.conf
Closed, ResolvedPublic

Description

I noticed repeated Puppet updates to /etc/resolv.conf, but /var/log/syslog showed only changes in one "direction" (here for tools-dev):

Mar 23 08:33:05 tools-dev puppet-agent[20208]: (/Stage[main]/Base::Resolving/File[/etc/resolv.conf]/content) 
Mar 23 08:33:05 tools-dev puppet-agent[20208]: (/Stage[main]/Base::Resolving/File[/etc/resolv.conf]/content) --- /etc/resolv.conf#0112015-03-23 08:27:57.160205067 +0000
Mar 23 08:33:05 tools-dev puppet-agent[20208]: (/Stage[main]/Base::Resolving/File[/etc/resolv.conf]/content) +++ /tmp/puppet-file20150323-20208-ri9403-0#0112015-03-23 08:33:05.240215730 +0000
Mar 23 08:33:05 tools-dev puppet-agent[20208]: (/Stage[main]/Base::Resolving/File[/etc/resolv.conf]/content) @@ -1,3 +1,9 @@
Mar 23 08:33:05 tools-dev puppet-agent[20208]: (/Stage[main]/Base::Resolving/File[/etc/resolv.conf]/content) +## THIS FILE IS MANAGED BY PUPPET
Mar 23 08:33:05 tools-dev puppet-agent[20208]: (/Stage[main]/Base::Resolving/File[/etc/resolv.conf]/content) +##
Mar 23 08:33:05 tools-dev puppet-agent[20208]: (/Stage[main]/Base::Resolving/File[/etc/resolv.conf]/content) +## source: modules/base/resolv.conf.labs.erb
Mar 23 08:33:05 tools-dev puppet-agent[20208]: (/Stage[main]/Base::Resolving/File[/etc/resolv.conf]/content) +## from:   base::resolving
Mar 23 08:33:05 tools-dev puppet-agent[20208]: (/Stage[main]/Base::Resolving/File[/etc/resolv.conf]/content) +
Mar 23 08:33:05 tools-dev puppet-agent[20208]: (/Stage[main]/Base::Resolving/File[/etc/resolv.conf]/content)  domain eqiad.wmflabs
Mar 23 08:33:05 tools-dev puppet-agent[20208]: (/Stage[main]/Base::Resolving/File[/etc/resolv.conf]/content) -search eqiad.wmflabs
Mar 23 08:33:05 tools-dev puppet-agent[20208]: (/Stage[main]/Base::Resolving/File[/etc/resolv.conf]/content) +search eqiad.wmflabs labs.eqiad.wmnet
Mar 23 08:33:05 tools-dev puppet-agent[20208]: (/Stage[main]/Base::Resolving/File[/etc/resolv.conf]/content) +options timeout:5 ndots:2
Mar 23 08:33:05 tools-dev puppet-agent[20208]: (/Stage[main]/Base::Resolving/File[/etc/resolv.conf]/content)  nameserver 10.68.16.1
Mar 23 08:33:05 tools-dev puppet-agent[20208]: FileBucket got a duplicate file {md5}62f5c3a6299680b7d0be8120fb03fa84
Mar 23 08:33:05 tools-dev puppet-agent[20208]: (/Stage[main]/Base::Resolving/File[/etc/resolv.conf]) Filebucketed /etc/resolv.conf to puppet with sum 62f5c3a6299680b7d0be8120fb03fa84
Mar 23 08:33:05 tools-dev puppet-agent[20208]: (/Stage[main]/Base::Resolving/File[/etc/resolv.conf]/content) content changed '{md5}62f5c3a6299680b7d0be8120fb03fa84' to '{md5}aab4d07473a21395f5bee957079fae9b'
[…]
Mar 23 17:53:19 tools-dev puppet-agent[14358]: (/Stage[main]/Base::Resolving/File[/etc/resolv.conf]/content) 
Mar 23 17:53:19 tools-dev puppet-agent[14358]: (/Stage[main]/Base::Resolving/File[/etc/resolv.conf]/content) --- /etc/resolv.conf#0112015-03-23 17:45:16.764937164 +0000
Mar 23 17:53:19 tools-dev puppet-agent[14358]: (/Stage[main]/Base::Resolving/File[/etc/resolv.conf]/content) +++ /tmp/puppet-file20150323-14358-s8oskv-0#0112015-03-23 17:53:19.856940248 +0000
Mar 23 17:53:19 tools-dev puppet-agent[14358]: (/Stage[main]/Base::Resolving/File[/etc/resolv.conf]/content) @@ -1,3 +1,9 @@
Mar 23 17:53:19 tools-dev puppet-agent[14358]: (/Stage[main]/Base::Resolving/File[/etc/resolv.conf]/content) +## THIS FILE IS MANAGED BY PUPPET
Mar 23 17:53:19 tools-dev puppet-agent[14358]: (/Stage[main]/Base::Resolving/File[/etc/resolv.conf]/content) +##
Mar 23 17:53:19 tools-dev puppet-agent[14358]: (/Stage[main]/Base::Resolving/File[/etc/resolv.conf]/content) +## source: modules/base/resolv.conf.labs.erb
Mar 23 17:53:19 tools-dev puppet-agent[14358]: (/Stage[main]/Base::Resolving/File[/etc/resolv.conf]/content) +## from:   base::resolving
Mar 23 17:53:19 tools-dev puppet-agent[14358]: (/Stage[main]/Base::Resolving/File[/etc/resolv.conf]/content) +
Mar 23 17:53:19 tools-dev puppet-agent[14358]: (/Stage[main]/Base::Resolving/File[/etc/resolv.conf]/content)  domain eqiad.wmflabs
Mar 23 17:53:19 tools-dev puppet-agent[14358]: (/Stage[main]/Base::Resolving/File[/etc/resolv.conf]/content) -search eqiad.wmflabs
Mar 23 17:53:19 tools-dev puppet-agent[14358]: (/Stage[main]/Base::Resolving/File[/etc/resolv.conf]/content) +search eqiad.wmflabs labs.eqiad.wmnet
Mar 23 17:53:19 tools-dev puppet-agent[14358]: (/Stage[main]/Base::Resolving/File[/etc/resolv.conf]/content) +options timeout:5 ndots:2
Mar 23 17:53:19 tools-dev puppet-agent[14358]: (/Stage[main]/Base::Resolving/File[/etc/resolv.conf]/content)  nameserver 10.68.16.1
Mar 23 17:53:20 tools-dev puppet-agent[14358]: FileBucket got a duplicate file {md5}62f5c3a6299680b7d0be8120fb03fa84
Mar 23 17:53:20 tools-dev puppet-agent[14358]: (/Stage[main]/Base::Resolving/File[/etc/resolv.conf]) Filebucketed /etc/resolv.conf to puppet with sum 62f5c3a6299680b7d0be8120fb03fa84
Mar 23 17:53:20 tools-dev puppet-agent[14358]: (/Stage[main]/Base::Resolving/File[/etc/resolv.conf]/content) content changed '{md5}62f5c3a6299680b7d0be8120fb03fa84' to '{md5}aab4d07473a21395f5bee957079fae9b'

Looking at the timestamps of the files showed a correlation with dhclient:

Mar 23 08:27:56 tools-dev dhclient: DHCPREQUEST of 10.68.16.8 on eth0 to 10.68.16.1 port 67
Mar 23 08:27:57 tools-dev dhclient: DHCPACK of 10.68.16.8 from 10.68.16.1
Mar 23 08:27:57 tools-dev dhclient: bound to 10.68.16.8 -- renewal in 33439 seconds.
[…]
Mar 23 17:45:16 tools-dev dhclient: DHCPREQUEST of 10.68.16.8 on eth0 to 10.68.16.1 port 67
Mar 23 17:45:16 tools-dev dhclient: DHCPACK of 10.68.16.8 from 10.68.16.1
Mar 23 17:45:16 tools-dev dhclient: bound to 10.68.16.8 -- renewal in 39803 seconds.

So it appears that dhclient in saving the nameserver information received during DHCP to /etc/resolv.conf does not only update the nameserver directive, but replaces the whole file instead.

Googling suggests that this is part of a greater Ubuntu scheme called "resolvconf" (cf. resolvconf(8)) where options like ndots & Co. need to go in /etc/resolvconf/resolv.conf.d/tail and search domains are a bit more complicated (all unconfirmed).

I'll play around with it for a bit to see what sticks.

Event Timeline

scfc claimed this task.
scfc raised the priority of this task from to Low.
scfc updated the task description. (Show Details)
scfc added a project: Cloud-Services.
scfc subscribed.
coren raised the priority of this task from Low to High.Mar 25 2015, 2:27 PM
coren subscribed.

Ew. Forgot that labs actually use dhclient unlike prod. This will need to be fixed before we can switch to designate.

We may be able to change the default dns server on labnet1001's dhcp server. That will fix part of this.

The remainder can (I think) be fixed by adding appropriate settings to head/tail/whatever.

Change 200648 had a related patch set uploaded (by Tim Landscheidt):
WIP: Snapshot

https://gerrit.wikimedia.org/r/200648

The problem with the whole setup (resolvconf and dhclient, that is) is that it does a lot of DWIM, and so I find it hard to get a consistent state to migrate to. resolvconf doesn't work if /etc/resolv.conf isn't a symlink to its own copy, however it has not created its own copy on a number of Tools instances, probably because the order of installing resolvconf, Puppet overwriting /etc/resolv.conf, etc., etc., etc. has been rather random. /sbin/ifdown eth0; /sbin/ifup eth0 will reliably trigger dhclient-script to run, but whether resolvconf is executed depends on this and that. There is an --enable-updates option that is barely documented and has this and that effect, etc.

TL&DR: It's complicated.

And forgot: As /run where resolvconf stores its copy is on a temporary file system, the state of the instances depends on whether the instance was rebooted after the Puppet change that set /etc/resolv.conf was merged.

Change 200595 had a related patch set uploaded (by Andrew Bogott):
Make labs resolv.conf play nice with resolvconf

https://gerrit.wikimedia.org/r/200595

I would love to just puppetize the file, but is murdering resolvconf enough? It seems like lots of other services can trigger random updates to resolv.conf

Also, how bitter am I that resolvconf has a --disable-updates argument but that it can't be disabled in /etc/defaults?

modules/labs_vmbuilder/files/postinst.sh:  rm -f /etc/resolv.conf
modules/base/manifests/resolving.pp:        file { '/etc/resolv.conf':
modules/base/manifests/resolving.pp:                'labs'  => template('base/resolv.conf.labs.erb'),
modules/base/manifests/resolving.pp:                default => template('base/resolv.conf.erb'),
modules/labs_bootstrapvz/files/labs-jessie.manifest.yaml:      - ['chroot', '{root}', 'rm', '-f', '/etc/resolv.conf']

Change 200999 had a related patch set uploaded (by Andrew Bogott):
Disable automatic updating of resolv.conf

https://gerrit.wikimedia.org/r/200999

I have some faith in that last patch -- hopefully no one will find it too offensive.

We could disable it like you did in that patch with nodnsupdate, but I don't like plastering yet another layer over dhclient, resolvconf, and what else Ubuntu and Debian gods have come up with :-).

(I assume Labs is stable enough again to spin up test instances?)

I'm open to any suggestions -- I've spent all day trying to do it the 'right way' cooperating with dhclient and I'm done.

Change 200999 merged by Andrew Bogott:
Disable automatic updating of resolv.conf

https://gerrit.wikimedia.org/r/200999

Change 200595 abandoned by Andrew Bogott:
Make labs resolv.conf play nice with resolvconf

Reason:
Abandoned in favor of murdering resolvconf entirely. https://gerrit.wikimedia.org/r/#/c/200999/

https://gerrit.wikimedia.org/r/200595

Change 200648 abandoned by Tim Landscheidt:
WIP: Snapshot

Reason:
I26b3eb07c98cb09cbe75b80a8c22736a35ac4232 overwrites /sbin/resolvconf, so any further iteration of this change would need to force a reinstall of the resolvconf package. Probably somehow possible with Puppet, but no fun.

https://gerrit.wikimedia.org/r/200648

scfc reassigned this task from scfc to Andrew.
scfc set Security to None.

Verified on tools-exec-13 (Precise), tools-exec-21 (Trusty) and toolsbeta-jessie-test2 (Jessie) that ifdown eth0; ifup eth0 does not overwrite /etc/resolv.conf.