Page MenuHomePhabricator

New instances attached to a role::puppetmaster::standalone Puppetmaster need manual changes after switching from the default Puppetmaster
Closed, DuplicatePublic

Description

After starting an new instance on labs project where puppetmaster is pointing to a role::puppetmaster::standalone instance, one has to delete /var/lib/puppet/ssl before puppet will function. Until you delete that directory, puppet agent -tv will give errors like this:

Warning: SSL_connect returned=1 errno=0 state=error: certificate verify failed: [self signed certificate in certificate chain for /CN=Puppet CA: deployment-puppetmaster.deployment-prep.eqiad.wmflabs]

100% reproducible on labs projects having a role::puppetmaster::standalone puppetmaster (eg: deployment-prep, integration, tools). There is no magic beside retrying (either delete/rebuild an instance or randomly delete bunch of files).

Workaround

See https://wikitech.wikimedia.org/wiki/Help:Standalone_puppetmaster:

Agent:

$ sudo -i puppet agent -tv
$ sudo rm -fR /var/lib/puppet/ssl
$ sudo rm /var/lib/puppet/server/ssl/ca/signed/$(hostname -f).pem
$ sudo -i puppet agent -tv

Master:

$ sudo puppet cert clean <fqdn of instance>

Agent:

$ sudo -i puppet agent -tv

Event Timeline

hashar renamed this task from After starting new deployment-prep instance, have to delete /var/lib/puppet/ssl before puppet will function to New instance have broken puppet configuration when using puppetmaster standalone.Nov 19 2016, 7:53 PM
hashar triaged this task as High priority.
hashar updated the task description. (Show Details)
hashar added subscribers: mobrovac, hashar.

@mobrovac here is the task related to the instance you could not boot last week.

Note that once one has deleted /var/lib/puppet/ssl, we still have to do a puppet cert clean on the puppet master. The next puppet run generates a new certificate and eventually keep failing with:

Error: /Stage[main]/Base::Certificates/Sslcert::Ca[Puppet_Internal_CA]/File[/usr/local/share/ca-certificates/Puppet_Internal_CA.crt]: Could not evaluate: Could not retrieve information from environment production source(s) file:/var/lib/puppet/client/ssl/certs/ca.pem
Notice: /Stage[main]/Sslcert/Exec[update-ca-certificates]: Dependency File[/usr/local/share/ca-certificates/Puppet_Internal_CA.crt] has failures: true
Warning: /Stage[main]/Sslcert/Exec[update-ca-certificates]: Skipping because of failed dependencies

Puppet provisions the Puppet_Internal_CA.crt file from /var/lib/puppet/client/ssl/certs/ca.pem however the rm -fR /var/lib/puppet/client/ssl get rid of it. The lame fix is to copy the ca from another place:

mkdir -p /var/lib/puppet/client/ssl/certs/
cp /var/lib/puppet/ssl/certs/ca.pem /var/lib/puppet/client/ssl/certs/ca.pem

That is still happening. Happened today when creating a Jessie instance for the 'integration' project.

(T152941 is slightly related, but refers to the case where the puppetmaster is switched after the initial Puppet setup.)

I tried to work around this bug by doing the exact steps mentioned above and it didn't work at all. It's my fourth time that I delete an instance and recreate it. This bug is really annoying.

After fighting with this for hours and thanks to google and inventor of coffee, it turned out certificates are not getting signed by puppetmaster and that's why it returns "Exiting; no certificate found and waitforcert is disabled". What needs to be done is "puppet cert sign <fqdn>" before and after "puppet cert clean <fqdn>" otherwise it can't delete the certificate or send the new one to the agent. I hope this might help to fix the underlying issue.

I might be wrong and I haven't tested this hypothesis but it might be because the original certificate hasn't been signed by puppetmaster and that can be the reason for the whole bug. Some resources: https://ask.puppet.com/question/19314/puppet-cert-clean-is-throws-error-could-not-find-a-serial-number/ https://linuxconfig.org/puppet-agent-exiting-no-certificate-found-and-waitforcert-is-disabled-solution

I also suffered with this for a very long time. Eventually I settled on having a standalone puppetmaster (for my other instances) but keeping the puppetmaster an agent of the labs puppetmaster not of itself.

I'm not sure if that is clear but I found that if you try and make it its own master you end up in a pickle because you end up with a certificate that is unsigned by the labs puppet master (because you deleted it and regenerated it). The only way to fix this is to destroy and rebuild the instance so a new certificate is generated and automatically signed by the labs puppet master (since you can't manually trigger this).

@hashar This seems like a known and documented process on how to create new instances on labs projects now, since the switch to using Standalone puppetmaster - https://wikitech.wikimedia.org/wiki/Standalone_puppetmaster. Let me know if I'm missing something!

bd808 renamed this task from New instance have broken puppet configuration when using puppetmaster standalone to New instances attached to a role::puppetmaster::standalone Puppetmaster need manual changes after switching from the default Puppetmaster.Jul 12 2017, 5:59 AM
bd808 removed a project: Cloud-VPS.
bd808 updated the task description. (Show Details)

(T152941 is slightly related, but refers to the case where the puppetmaster is switched after the initial Puppet setup.)

This is actually the same problem in both cases because the initial Puppet run for a new instance will always be made against the Cloud VPS shared puppetmaster. If the project or instance has an alternate puppetmaster hiera setting that will be applied during the first run changing /etc/puppet/puppet.conf in the same manner described in T152941: Make changing puppetmasters for Labs instances more easy.

Thank you @bd808 . It is good to see T152941 has an explanation for the issue :]