Page MenuHomePhabricator

project-local puppetmasters getting reset to labs-puppetmaster
Closed, ResolvedPublic

Description

This has apparently happened twice. Once about 27 hours ago (on 2019-03-28 at around noon UTC) and once on 2019-03-21 (also sometime close to noon)

My guess is that something is causing the enc to fail and default hiera values to be applied on a puppet run. It /should/ just error out when that happens and not complete the puppet run, but evidence suggests otherwise.

Let's see if this happens more.

Event Timeline

To be specific, 9 deployment-prep instances did it yesterday, one of them around 11:59:31:

 Mar 28 11:59:31 deployment-urldownloader02 puppet-agent[28084]: (/Stage[main]/Base::Puppet/Base::Puppet::Config[main]/File[/etc/puppet/puppet.conf.d/10-main.conf]/content) -server = deployment-puppetmaster03.deployment-prep.eqiad.wmflabs
Mar 28 11:59:31 deployment-urldownloader02 puppet-agent[28084]: (/Stage[main]/Base::Puppet/Base::Puppet::Config[main]/File[/etc/puppet/puppet.conf.d/10-main.conf]/content) +server = labs-puppetmaster.wikimedia.org

I believe the others were around the same time frame but did not check them all.

um, according to syslog the project puppetmaster appears to have started shortly before the problem:
Mar 28 11:59:29 deployment-puppetmaster03 puppet-master[1682]: Starting Puppet master version 4.8.2

This has happened again:

(24) deployment-acme-chief[03-04].deployment-prep.eqiad.wmflabs,deployment-aqs[01-03].deployment-prep.eqiad.wmflabs,deployment-cache-text05.deployment-prep.eqiad.wmflabs,deployment-cache-upload05.deployment-prep.eqiad.wmflabs,deployment-chromium02.deployment-prep.eqiad.wmflabs,deployment-cumin02.deployment-prep.eqiad.wmflabs,deployment-db06.deployment-prep.eqiad.wmflabs,deployment-docker-citoid01.deployment-prep.eqiad.wmflabs,deployment-docker-cxserver01.deployment-prep.eqiad.wmflabs,deployment-docker-mathoid01.deployment-prep.eqiad.wmflabs,deployment-hadoop-test-[1-3].deployment-prep.eqiad.wmflabs,deployment-imagescaler03.deployment-prep.eqiad.wmflabs,deployment-maps05.deployment-prep.eqiad.wmflabs,deployment-ms-be[05-06].deployment-prep.eqiad.wmflabs,deployment-ms-fe03.deployment-prep.eqiad.wmflabs,deployment-poolcounter05.deployment-prep.eqiad.wmflabs,deployment-prometheus02.deployment-prep.eqiad.wmflabs,deployment-urldownloader02.deployment-prep.eqiad.wmflabs
----- OUTPUT of 'grep labs-puppet...ppet/puppet.conf' -----                                                                                                                                                 
server = labs-puppetmaster.wikimedia.org
May 22 13:39:01 deployment-acme-chief03 CRON[27438]: (root) CMD (/usr/local/sbin/puppet-run > /dev/null 2>&1)
May 22 13:39:02 deployment-acme-chief03 puppet-agent-cronjob: Sleeping 20 for random splay
May 22 13:39:26 deployment-acme-chief03 puppet-agent[27821]: Downgrading to PSON for future requests
May 22 13:39:26 deployment-acme-chief03 puppet-agent[27821]: Using configured environment 'production'
May 22 13:39:26 deployment-acme-chief03 puppet-agent[27821]: Retrieving pluginfacts
May 22 13:39:26 deployment-acme-chief03 puppet-agent[27821]: Retrieving plugin
May 22 13:39:26 deployment-acme-chief03 puppet-agent[27821]: Loading facts
May 22 13:39:36 deployment-acme-chief03 puppet-agent[27821]: Caching catalog for deployment-acme-chief03.deployment-prep.eqiad.wmflabs
May 22 13:39:36 deployment-acme-chief03 puppet-agent[27821]: Applying configuration version '1558532370'
May 22 13:39:37 deployment-acme-chief03 puppet-agent[27821]: Computing checksum on file /etc/apt/sources.list.d/project-aptly.list
May 22 13:39:37 deployment-acme-chief03 puppet-agent[27821]: (/Stage[main]/Apt/File[/etc/apt/sources.list.d/project-aptly.list]) Filebucketed /etc/apt/sources.list.d/project-aptly.list to puppet with sum 359c83a1139d09149269ef9819af28cf
May 22 13:39:37 deployment-acme-chief03 puppet-agent[27821]: (/Stage[main]/Apt/File[/etc/apt/sources.list.d/project-aptly.list]/ensure) removed
May 22 13:39:37 deployment-acme-chief03 crontab[27908]: (root) LIST (root)
May 22 13:39:37 deployment-acme-chief03 crontab[27911]: (root) LIST (prometheus)
May 22 13:39:37 deployment-acme-chief03 crontab[27914]: (root) LIST (acme-chief)
May 22 13:39:39 deployment-acme-chief03 puppet-agent[27821]: (/Stage[main]/Nrpe/Package[nagios-plugins]/ensure) created
May 22 13:39:39 deployment-acme-chief03 puppet-agent[27821]: (/Stage[main]/Nrpe/Package[nagios-plugins-basic]/ensure) created
May 22 13:39:39 deployment-acme-chief03 puppet-agent[27821]: (/Stage[main]/Nrpe/Package[nagios-plugins-standard]/ensure) created
May 22 13:39:40 deployment-acme-chief03 puppet-agent[27821]: openstack::clientpackages::vms::mitaka::buster: no special configuration yet
May 22 13:39:40 deployment-acme-chief03 puppet-agent[27821]: (/Stage[main]/Openstack::Clientpackages::Vms::Mitaka::Buster/Notify[openstack::clientpackages::vms::mitaka::buster: no special configuration yet]/message) defined 'message' as 'openstack::clientpackages::vms::mitaka::buster: no special configuration yet'
May 22 13:39:40 deployment-acme-chief03 puppet-agent[27821]: The LDAP client stack for this host is: classic
May 22 13:39:40 deployment-acme-chief03 puppet-agent[27821]: (/Stage[main]/Profile::Ldap::Client::Labs/Notify[LDAP client stack]/message) defined 'message' as 'The LDAP client stack for this host is: classic'
May 22 13:39:41 deployment-acme-chief03 puppet-agent[27821]: (/Stage[main]/Base::Puppet/Base::Puppet::Config[main]/File[/etc/puppet/puppet.conf.d/10-main.conf]/content) 
May 22 13:39:41 deployment-acme-chief03 puppet-agent[27821]: (/Stage[main]/Base::Puppet/Base::Puppet::Config[main]/File[/etc/puppet/puppet.conf.d/10-main.conf]/content) --- /etc/puppet/puppet.conf.d/10-main.conf#0112019-04-16 18:09:54.137878345 +0000
May 22 13:39:41 deployment-acme-chief03 puppet-agent[27821]: (/Stage[main]/Base::Puppet/Base::Puppet::Config[main]/File[/etc/puppet/puppet.conf.d/10-main.conf]/content) +++ /tmp/puppet-file20190522-27821-dcuw4f#0112019-05-22 13:39:41.326089802 +0000
May 22 13:39:41 deployment-acme-chief03 puppet-agent[27821]: (/Stage[main]/Base::Puppet/Base::Puppet::Config[main]/File[/etc/puppet/puppet.conf.d/10-main.conf]/content) @@ -11,7 +11,7 @@
May 22 13:39:41 deployment-acme-chief03 puppet-agent[27821]: (/Stage[main]/Base::Puppet/Base::Puppet::Config[main]/File[/etc/puppet/puppet.conf.d/10-main.conf]/content)  factpath = $vardir/lib/facter
May 22 13:39:41 deployment-acme-chief03 puppet-agent[27821]: (/Stage[main]/Base::Puppet/Base::Puppet::Config[main]/File[/etc/puppet/puppet.conf.d/10-main.conf]/content)  
May 22 13:39:41 deployment-acme-chief03 puppet-agent[27821]: (/Stage[main]/Base::Puppet/Base::Puppet::Config[main]/File[/etc/puppet/puppet.conf.d/10-main.conf]/content)  [agent]
May 22 13:39:41 deployment-acme-chief03 puppet-agent[27821]: (/Stage[main]/Base::Puppet/Base::Puppet::Config[main]/File[/etc/puppet/puppet.conf.d/10-main.conf]/content) -server = deployment-puppetmaster03.deployment-prep.eqiad.wmflabs
May 22 13:39:41 deployment-acme-chief03 puppet-agent[27821]: (/Stage[main]/Base::Puppet/Base::Puppet::Config[main]/File[/etc/puppet/puppet.conf.d/10-main.conf]/content) +server = labs-puppetmaster.wikimedia.org
May 22 13:39:41 deployment-acme-chief03 puppet-agent[27821]: (/Stage[main]/Base::Puppet/Base::Puppet::Config[main]/File[/etc/puppet/puppet.conf.d/10-main.conf]/content)  
May 22 13:39:41 deployment-acme-chief03 puppet-agent[27821]: (/Stage[main]/Base::Puppet/Base::Puppet::Config[main]/File[/etc/puppet/puppet.conf.d/10-main.conf]/content)  
May 22 13:39:41 deployment-acme-chief03 puppet-agent[27821]: (/Stage[main]/Base::Puppet/Base::Puppet::Config[main]/File[/etc/puppet/puppet.conf.d/10-main.conf]/content)  daemonize = false
May 22 13:39:41 deployment-acme-chief03 puppet-agent[27821]: Computing checksum on file /etc/puppet/puppet.conf.d/10-main.conf
May 22 13:39:41 deployment-acme-chief03 puppet-agent[27821]: (/Stage[main]/Base::Puppet/Base::Puppet::Config[main]/File[/etc/puppet/puppet.conf.d/10-main.conf]) Filebucketed /etc/puppet/puppet.conf.d/10-main.conf to puppet with sum 73ac1d853f8c1aeb622e53a03efe0b07
May 22 13:39:41 deployment-acme-chief03 puppet-agent[27821]: (/Stage[main]/Base::Puppet/Base::Puppet::Config[main]/File[/etc/puppet/puppet.conf.d/10-main.conf]/content) content changed '{md5}73ac1d853f8c1aeb622e53a03efe0b07' to '{md5}a782d5f4005e93bc461131c7e16def71'
May 22 13:39:41 deployment-acme-chief03 puppet-agent[27821]: (/Stage[main]/Base::Puppet/Base::Puppet::Config[main]/File[/etc/puppet/puppet.conf.d/10-main.conf]) Scheduling refresh of Exec[delete master certs]
May 22 13:39:41 deployment-acme-chief03 puppet-agent[27821]: (/Stage[main]/Base::Puppet/Base::Puppet::Config[main]/File[/etc/puppet/puppet.conf.d/10-main.conf]) Scheduling refresh of Exec[compile puppet.conf]
May 22 13:39:41 deployment-acme-chief03 puppet-agent[27821]: (/Stage[main]/Base::Puppet/Exec[delete master certs]) Triggered 'refresh' from 1 event
May 22 13:39:41 deployment-acme-chief03 puppet-agent[27821]: Computing checksum on file /etc/ssh/userkeys/root
May 22 13:39:41 deployment-acme-chief03 puppet-agent[27821]: (/Stage[main]/Passwords::Root/Ssh::Userkey[root]/File[/etc/ssh/userkeys/root]) Filebucketed /etc/ssh/userkeys/root to puppet with sum 701b5950ec0373eb918970a97aa64605
May 22 13:39:41 deployment-acme-chief03 puppet-agent[27821]: (/Stage[main]/Passwords::Root/Ssh::Userkey[root]/File[/etc/ssh/userkeys/root]/content) content changed '{md5}701b5950ec0373eb918970a97aa64605' to '{md5}af9fc71bb296ceacbc6ad11ff022a3ef'
May 22 13:39:41 deployment-acme-chief03 puppet-agent[27821]: (/Stage[main]/Base::Puppet/Exec[compile puppet.conf]) Triggered 'refresh' from 1 event
May 22 13:39:42 deployment-acme-chief03 ssh-agent[23648]: debug2: fd 4 setting O_NONBLOCK
May 22 13:39:42 deployment-acme-chief03 ssh-agent[23648]: debug1: process_message: socket 1 (fd=4) type 11
May 22 13:39:42 deployment-acme-chief03 ssh-agent[23648]: debug1: process_message: socket 1 (fd=4) type 13
May 22 13:39:43 deployment-acme-chief03 puppet-agent[27821]: (/Stage[main]/Acme_chief::Server/Exec[/usr/local/bin/acme-chief-certs-sync]/returns) executed successfully
May 22 13:39:44 deployment-acme-chief03 puppet-agent[27821]: Applied catalog in 7.93 seconds
Bstorm subscribed.

I wonder if this is something that was fixed at this point...

bd808 claimed this task.
bd808 subscribed.

No new reports here since last May. I am pretty sure that if this was an ongoing problem that would have happened by now.