Page MenuHomePhabricator

puppetmaster hostcert and hostprivkey point to nonexistent files
Open, Stalled, NormalPublic

Description

The hostcert and hostprivkey settings in puppet.conf on the puppetmasters point to files that don't exist. For example on puppetmaster1001:

puppetmaster1001:~# cat /etc/puppet/puppet.conf | grep ^host
hostcert = /var/lib/puppet/server/ssl/certs/puppetmaster1001.eqiad.wmnet.pem
hostprivkey = /var/lib/puppet/server/ssl/private_keys/puppetmaster1001.eqiad.wmnet.pem

puppetmaster1001:~# ls /var/lib/puppet/server/ssl/certs/puppetmaster1001.eqiad.wmnet.pem
ls: cannot access /var/lib/puppet/server/ssl/certs/puppetmaster1001.eqiad.wmnet.pem: No such file or directory

puppetmaster1001:~# ls /var/lib/puppet/server/ssl/private_keys/puppetmaster1001.eqiad.wmnet.pem
ls: cannot access /var/lib/puppet/server/ssl/private_keys/puppetmaster1001.eqiad.wmnet.pem: No such file or directory

A misconfiguration here is potentially dangerous. For example the debian puppet-master-passenger package post-install script checks if the configured hostcert file exists to determine if the puppet CA should be initialized.

# /var/lib/dpkg/info/puppet-master-passenger.postinst
#
# Initialize the puppet master CA and generate the master
# certificate only if the host doesn't already have any puppet
# ssl certificate.  The ssl key and cert need to be available
# (eg generated) before apache2 is configured and started
# since apache2 ssl configuration uses the puppet master ssl
# files.
if [ ! -e "$(puppet master --configprint hostcert)" ]; then
    puppet cert generate $(puppet master --configprint certname)
fi

Event Timeline

herron created this task.Oct 26 2017, 5:43 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptOct 26 2017, 5:43 PM

Change 386666 had a related patch set uploaded (by Herron; owner: Herron):
[operations/puppet@production] Puppet: Change hostcert and hostprivkey paths on puppetmasters

https://gerrit.wikimedia.org/r/386666

@akosiaris following up to your comment in https://gerrit.wikimedia.org/r/#/c/386666/

Unless I am mistaken (documentation is alluding to that, but unfortunately not making it explicit [1]), these 2 settings are important only in in the [agent](and of course [main]) sections, not in [master]. Which explains why we have no problem up to now with this.
[1] https://puppet.com/docs/puppet/4.8/configuration.html
I would test instead fully removing them.

What do you think about the if condition in the puppet-master-passenger package post install script above? It seems in this case the hostcert setting is being used by puppet master.

@akosiaris following up to your comment in https://gerrit.wikimedia.org/r/#/c/386666/

Unless I am mistaken (documentation is alluding to that, but unfortunately not making it explicit [1]), these 2 settings are important only in in the [agent](and of course [main]) sections, not in [master]. Which explains why we have no problem up to now with this.
[1] https://puppet.com/docs/puppet/4.8/configuration.html
I would test instead fully removing them.

What do you think about the if condition in the puppet-master-passenger package post install script above? It seems in this case the hostcert setting is being used by puppet master.

Is it though ? Looking at https://github.com/puppetlabs/puppet/search?q=hostprivkey&type=Code&utf8=%E2%9C%93 I don't get a single hit show how that variable is used. It's full of acceptance and unit tests.

https://github.com/puppetlabs/puppet/search?utf8=%E2%9C%93&q=hostcert&type=Code isn't different much. I see usage in https://github.com/puppetlabs/puppet/blob/d973bdedb510cc1648e739ccfe28d4d9bcffa6de/lib/puppet/ssl/validator/default_validator.rb but it's about creating an HTTP connection which is what normally the agent does.

These ^ are about master but checking tag 3.6.1 does not yield anything more useful

Change 386666 merged by Herron:
[operations/puppet@production] Puppet: Change hostcert and remove hostprivkey master settings

https://gerrit.wikimedia.org/r/386666

herron added a comment.Nov 8 2017, 8:07 PM

After deploying the updated hostcert setting in https://gerrit.wikimedia.org/r/386666 rhodium began logging two types of "PuppetDB" errors:

Failed to submit 'replace facts' command for <node redacted> to PuppetDB at nitrogen.eqiad.wmnet:443: undefined method `content' for nil:NilClass
Could not retrieve facts for <node redacted>: Failed to find facts from PuppetDB at nitrogen.eqiad.wmnet:443: undefined method `content' for nil:NilClas

This caused agent runs that were routed to rhodium (via puppetmaster1001 apache balancer) to fail, and triggered a huge number of failed run alerts. Other puppetmasters with the updated hostcert setting worked without issue.

The change has since been reverted and rhodium depooled (via puppetmaster1001 apache)

puppetcompiler1001's puppet agent is configured to connect directly to rhodium (through puppetmaser1001.eqiad.wmnet vhost) for troubleshooting.

Observations and questions:

  1. When hostcert is set to the existing certificate /var/lib/puppet/ssl/certs/rhodium.eqiad.wmnet.pem the above PuppetDB errors are produced.
  2. The original hostcert setting refers to a missing file /var/lib/puppet/server/ssl/certs/rhodium.eqiad.wmnet.pem but works. Why does this work?
  3. Similar to above, removing (or commenting) the hostcert setting refers to a different but also missing file /var/lib/puppet/server/ssl/certs/puppet.pem but works. Why does this work?
  4. The value of hostprivkey appears to have no effect. The outcome is the same if it is set, unset, or refers to a nonexistent file.
  5. Re-generating rhodium's certificate makes no difference
  6. Likewise deactivating rhodium and re-generating certificate again makes no difference
  7. Both "Failed to find facts" and "Failed to submit 'replace facts'" error messages can be found in the PuppetDB repository indicating we should rule out an issue with PuppetDB.
    1. https://github.com/puppetlabs/puppetdb/search?utf8=✓&q=%22Failed+to+submit%22&type=
    2. https://github.com/puppetlabs/puppetdb/search?utf8=✓&q=%22Failed+to+find+facts%22&type=
Joe added a comment.Nov 9 2017, 8:53 AM

I was able to extract a semi-meaningful backtrace from rhodium:

Nov  9 08:44:22 rhodium puppet-master[3889]: undefined method `content' for nil:NilClass
Nov  9 08:44:22 rhodium puppet-master[3889]: /usr/lib/ruby/vendor_ruby/puppet/ssl/validator/default_validator.rb:126:in `setup_connection'
Nov  9 08:44:22 rhodium puppet-master[3889]: /usr/lib/ruby/vendor_ruby/puppet/network/http/nocache_pool.rb:14:in `with_connection'
Nov  9 08:44:22 rhodium puppet-master[3889]: /usr/lib/ruby/vendor_ruby/puppet/network/http/connection.rb:218:in `with_connection'
Nov  9 08:44:22 rhodium puppet-master[3889]: /usr/lib/ruby/vendor_ruby/puppet/network/http/connection.rb:173:in `block in request_with_redirects'
Nov  9 08:44:22 rhodium puppet-master[3889]: /usr/lib/ruby/vendor_ruby/puppet/network/http/connection.rb:170:in `upto'
Nov  9 08:44:22 rhodium puppet-master[3889]: /usr/lib/ruby/vendor_ruby/puppet/network/http/connection.rb:170:in `request_with_redirects'
Nov  9 08:44:22 rhodium puppet-master[3889]: /usr/lib/ruby/vendor_ruby/puppet/network/http/connection.rb:87:in `post'
Nov  9 08:44:22 rhodium puppet-master[3889]: /usr/lib/ruby/vendor_ruby/puppet/util/puppetdb/command.rb:49:in `block in submit'
Nov  9 08:44:22 rhodium puppet-master[3889]: /usr/lib/ruby/vendor_ruby/puppet/util/profiler/around_profiler.rb:58:in `profile'
Nov  9 08:44:22 rhodium puppet-master[3889]: /usr/lib/ruby/vendor_ruby/puppet/util/profiler.rb:51:in `profile'
Nov  9 08:44:22 rhodium puppet-master[3889]: /usr/lib/ruby/vendor_ruby/puppet/util/puppetdb.rb:108:in `profile'
Nov  9 08:44:22 rhodium puppet-master[3889]: /usr/lib/ruby/vendor_ruby/puppet/util/puppetdb/command.rb:47:in `submit'
Nov  9 08:44:22 rhodium puppet-master[3889]: /usr/lib/ruby/vendor_ruby/puppet/util/puppetdb.rb:86:in `block in submit_command'
Nov  9 08:44:22 rhodium puppet-master[3889]: /usr/lib/ruby/vendor_ruby/puppet/util/profiler/around_profiler.rb:58:in `profile'
Nov  9 08:44:22 rhodium puppet-master[3889]: /usr/lib/ruby/vendor_ruby/puppet/util/profiler.rb:51:in `profile'
Nov  9 08:44:22 rhodium puppet-master[3889]: /usr/lib/ruby/vendor_ruby/puppet/util/puppetdb.rb:108:in `profile'
Nov  9 08:44:22 rhodium puppet-master[3889]: /usr/lib/ruby/vendor_ruby/puppet/util/puppetdb.rb:83:in `submit_command'

So the problem happens deep into the setup of an ssl connection - specifically in this function of the default validator:

# Registers the instance's call method with the connection.
#
# @param [Net::HTTP] connection The connection to validate
#
# @return [void]
#
# @api private
#
def setup_connection(connection)
  if ssl_certificates_are_present?
    connection.cert_store = @ssl_host.ssl_store
    connection.ca_file = @ssl_configuration.ca_auth_file
    connection.cert = @ssl_host.certificate.content
    connection.key = @ssl_host.key.content
    connection.verify_mode = OpenSSL::SSL::VERIFY_PEER
    connection.verify_callback = self
  else
    connection.verify_mode = OpenSSL::SSL::VERIFY_NONE
  end
end

with the error happening because @ssl_host.certificate is Nil.

This code path is reached only when ssl_certificates_are_present? evaluates to true, that is when

def ssl_certificates_are_present?
  Puppet::FileSystem.exist?(Puppet[:hostcert]) && Puppet::FileSystem.exist?(@ssl_configuration.ca_auth_file)
end

which explains why this only happens when the hostcert setting points to a real-world file, not why @ssl_host.certificate is nil on rhodium in this situation and not on the other masters.

Joe added a comment.Nov 9 2017, 11:25 AM

Mistery solved: in the method @ssh_host.certificate calls, that is Puppet::SSL::Host.certificate, we have

return nil unless @certificate = Certificate.indirection.find(name)

and rhodium had a spurious self-generated ca.pem in /var/lib/puppet/server/ssl/certs/ca.pem; removing it magically solved the issue.

So summarizing:

  • Any HTTPS request puppet makes generates a Puppet::SSL::Validator::Default_Validator instance,
  • if the hostcert setting in the master section of puppet.conf points to a non-existent file, and/or the ca.pem file is not present in the ssldir set up for the master, no verification of the ssl connection happens
  • If both are present (as it was the case for rhodium, which for unkown reasons included a ca.pem file under /var/lib/puppet/server/certs), validation is attempted
  • the validator instance does access the client cert information from the singleton Puppet::SSL::Host.localhost (referred to as @ssl_host above)
  • The Puppet::SSL::Host.certificate method has the snippet above, and since the certificate gets generated and stored in the agent configured ssldir, it doesn't find it under /var/lib/puppet/server/ssl/certs. Please note how the indirection.find method is used instead of the hostcert setting that is used to check if the cert is in place before.
  • Since we have no cert file, nil is returned and accessing nil.content fails.

This is my current understanding, and to be honest I could not find a combination of configs that would work on a puppet 3.x host that both correctly sets up the SSL verification of the calls (so having the hostcert and hostprivkey set to existing files, and also having the ca file on disk), and a different ssldir for the server and the client, given our current setup with frontend servers and backend servers. I also strongly suspect that this wouldn't work correctly on puppet4 either on any host that's not a puppetmaster frontend and/or a CA server.

I would say we can continue not to add the ca.pem file to the master's ssldir for now, and revisit the issue once we've migrated, as having no SSL verification in connections to puppetdb is pretty sad.
At this point, I'm not even sure it's a good idea having a separate ssldir setting for the master and the agent.

Joe added a comment.Nov 9 2017, 11:40 AM

In all this, some random person revoked puppetmaster1001's own certificate, which is used to access the ca_server, as far as I understand, which cannot be good.

We have to fix that.

herron changed the task status from Open to Stalled.Nov 9 2017, 7:18 PM

For the time being we're going to leave the hostcert setting alone and work around it during puppetmaster upgrades by manually adjusting and reverting the setting while a system is depooled.

Will re-visit after completing the upgrade to puppet 4.

herron added a comment.Nov 9 2017, 8:53 PM

T180167 created for the revoked puppetmaster1001 certificate