Page MenuHomePhabricator

phabricator-stage-1001.devtools.eqiad1.wikimedia.cloud fails puppet
Closed, ResolvedPublic

Description

Warning: Unable to fetch my node definition, but the agent run will continue:
Warning: SSL_connect returned=1 errno=0 state=error: certificate verify failed (certificate rejected): [ok for /CN=puppetmaster-1001.devtools.eqiad1.wikimedia.cloud]

Event Timeline

I have manually edit /etc/puppet/puppet.conf to point to the generic puppet master: puppetmaster.cloudinfra.wmflabs.org

Removed /var/lib/puppet/ssl

That is still changed though:

Notice: /Stage[main]/Base::Puppet/Base::Puppet::Config[main]/File[/etc/puppet/puppet.conf.d/10-main.conf]/content: 
--- /etc/puppet/puppet.conf.d/10-main.conf	2020-11-30 08:02:57.379634719 +0000
+++ /tmp/puppet-file20201130-32108-1od6utp	2020-11-30 08:04:23.722684248 +0000
@@ -11,7 +11,7 @@
 factpath = $vardir/lib/facter
 
 [agent]
-server = puppetmaster.cloudinfra.wmflabs.org
+server = puppetmaster-1001.devtools.eqiad.wmflabs

Because puppet.git has:

hieradata/cloud/eqiad1/devtools/common.yaml:158:profile::puppetdb::master: puppetmaster-1001.devtools.eqiad.wmflabs
hieradata/cloud/eqiad1/devtools/common.yaml:159:profile::puppetmaster::common::puppetdb_host: puppetmaster-1001.devtools.eqiad.wmflabs
hieradata/cloud/eqiad1/devtools/common.yaml:166:  puppetmaster-1001.devtools.eqiad.wmflabs:
hieradata/cloud/eqiad1/devtools/common.yaml:167:    - { worker: puppetmaster-1001.devtools.eqiad.wmflabs, loadfactor: 10 }
hieradata/cloud/eqiad1/devtools/hosts/phabricator-stage-1001.yaml:1:puppetmaster: puppetmaster-1001.devtools.eqiad.wmflabs

There is something broken with the Puppet master configuration for phabricator-stage-1001 and I can't quite figure it out :-\

I have manually edit /etc/puppet/puppet.conf to point to the generic puppet master: puppetmaster.cloudinfra.wmflabs.org

Please don't. This is using our own local puppet master. And we don't manually edit config.

It was previously broken by others trying to change puppet masters, apparently.

It was broken due to some certificate issue. I tried switching to puppetmaster.cloudinfra.wmflabs.org cause that sometimes fix the cert. Puppet eventually set back the previous puppet master. Maybe there is an issue related to the instance FQDN used in the certificate? Anyway I haven't touched anything else after my short investigation.

Change 649275 had a related patch set uploaded (by Hashar; owner: Hashar):
[operations/puppet@production] devtools: update puppetmaster DNS domain

https://gerrit.wikimedia.org/r/649275

I fixed it by changing the Puppetmaster FQDN:

Instance Hiera config
- puppetmaster: puppetmaster-1001.devtools.eqiad.wmflabs
+ puppetmaster: puppetmaster-1001.devtools.eqiad1.wikimedia.cloud

Change 649275 merged by Dzahn:
[operations/puppet@production] devtools: update puppetmaster DNS domain

https://gerrit.wikimedia.org/r/649275

Thank you! I was just checking on that instance as well and fixing it was on my list but did not get to it on Friday.

I see some other puppet warnings (for example related to aphlict) but the puppet run does finish :)

(The actual reason is not what was expected though)