Page MenuHomePhabricator

deployment-parsoidcache02 is not using betalabs puppetmaster
Closed, ResolvedPublic

Description

It is still using virt1000, which is the general labs puppetmaster. I don't think this is intentional :) This also prevents testing puppet patches on betalabs on this machine.


Version: unspecified
Severity: normal

Details

Reference
bz73357

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 3:54 AM
bzimport set Reference to bz73357.

Indeed:

deployment-parsoidcache02:~$ grep server /etc/puppet/puppet.conf
server = virt1000.wikimedia.org

Although the wikitech page shows that the instance has role::puppet::self with puppetmaster=deployment-salt.eqiad.wmflabs

puppet fails to run though due to an error with the varnish ganglia monitoring. Yuvi has been working on it.

I have removed the role::cache::parsoid class from the instance and ran again puppet. That unlocked puppet and:

$ grep server /etc/puppet/puppet.conf
server = deployment-salt.eqiad.wmflabs
$

The next run of puppet complains about certificate exchange with the deployment-salt puppet master:

Info: Creating a new SSL key for i-000005bf.eqiad.wmflabs
Info: Caching certificate for ca
Info: csr_attributes file loading from /etc/puppet/csr_attributes.yaml
Info: Creating a new SSL certificate request for i-000005bf.eqiad.wmflabs
Info: Certificate Request fingerprint (SHA256): ...
Info: Caching certificate for ca
Exiting; no certificate found and waitforcert is disabled

On deployment-salt:

  1. puppet ca list i-000005bf.eqiad.wmflabs (SHA256) ... #

puppet ca sign i-000005bf.eqiad.wmflabs

Notice: Signed certificate request for i-000005bf.eqiad.wmflabs
...
#

salt-key --list=unaccepted

Unaccepted Keys:
i-000005bf.eqiad.wmflabs

salt-key --accept=i-000005bf.eqiad.wmflabs

The following keys are going to be accepted:
Unaccepted Keys:
i-000005bf.eqiad.wmflabs
Proceed? [n/Y] Y
Key for minion i-000005bf.eqiad.wmflabs accepted.
#

Reran puppet. Starting the service of salt-minion fails, looking at the upstart log in /var/log/upstart/salt-minion.log there is:

[CRITICAL] The Salt Master server's public key did not authenticate!
The master public key can be found at:
/etc/salt/pki/minion/minion_master.pub
  1. rm /etc/salt/pki/minion/minion_master.pub
  2. rm /var/run/salt-minion.pid
  3. killall salt-minion
  4. /etc/init.d/salt-minion start
  5. sleep 10; /etc/init.d/salt-minion status
    • salt-minion is running #

\O/

Another puppet run confirms everything is fine. I then reapplied the role::cache::parsoid class which finished the configuration.

The whole process is explained at https://wikitech.wikimedia.org/wiki/Nova_Resource:Deployment-prep/How_code_is_updated#Converting_a_host_to_use_local_puppetmaster_and_salt_master

My assumption is that when the instance has been created, puppet failed and never applied the switch to beta cluster puppet and salt masters.

NOTE: I have no clue whether the service actually work, but the underlying infrastructure serving it seems to be fine.

I have upgraded varnish while at it (apt-get upgrade + restart of the varnish* services):

Unpacking varnish (3.0.6plus~x-wm3) over (3.0.5plus~x-wm7trusty1) ...

On production, the varnish upgrade is done manually, so we have to do it manually as well on beta cluster.

deployment-parsoidcache02 is now using betalabs puppetmaster and salt master. Varnish has been upgraded.

I have NOT verified whether the parsoid cache works as expected. I guess that would need another bug report.