Page MenuHomePhabricator

Phase out the 'puppet' module with fire, make self hosted puppetmasters use the puppetmaster module
Closed, ResolvedPublic

Description

It has caused enough pain and suffering and anguish and sorrow and death. Time for it to go.

Additional reasons:

  1. It uses a completely different bit of code for all the client pupept setup - in modules/puppet rather than modules/base/puppet.
  2. It puts ssl certs for puppet in a different place than in everywhere else, causing code to have to find ways to differentiate between instances on role::puppet::self and those not.

If we can get rid of it for the puppet clients at least, I'd be happier.

Event Timeline

yuvipanda raised the priority of this task from to Needs Triage.
yuvipanda updated the task description. (Show Details)
yuvipanda added projects: SRE, Cloud-Services.
yuvipanda subscribed.

Change 256624 had a related patch set uploaded (by Yuvipanda):
puppet: Kill puppet::self::geoip

https://gerrit.wikimedia.org/r/256624

Change 256624 merged by Yuvipanda:
puppet: Kill puppet::self::geoip

https://gerrit.wikimedia.org/r/256624

Ok, so right now role::puppet::self can do one of 3 things:

  1. Be 'self hosted puppetmaster' - you have a puppetmaster that is used only for that instance
  2. Be a 'puppetmaster' that other instances can then connect to as their puppetmaster
  3. Be a custom puppet *client*, that instructs instances which puppetmaster to connect to.

These are, IMO, three different cases (or at least 2, if you count #1 to be a combo of #2 and #3). They should be split out.

Proposal:

  1. Have role::puppet::self strictly mean only scenario #1
  2. Have a different class / role that is scenario #2
  3. Scenario #3 should be possible by just setting the puppetmaster hiera variable.
This comment has been deleted.

Doing all these in a nice way should also allow us to unify the location of the SSL stuff for puppet. Right now it can be in /var/lib/puppet/ssl (if you are not using role::puppet::self), in /var/lib/puppet/client/ssl (if you are using role::puppet::self in scenario #3, or in /var/lib/puppet/server/ssl if you are using role::puppet::self in scenario #2.

Also the different reasons instances have role::puppet::self applied:

  1. For testing puppet changes
  2. To act as puppetmaster for a project, for cherry-picking / testing changes (deployment-prep / integration) or for using a private repository (tools-puppetmaster)
  3. To act as a client for a puppetmaster in a project (this requires hiera or LDAP variable to say where the puppetmaster is).

These 1, 2, 3 also roughly correspond to the scenarios #1, 2, 3 from earlier comment.

Hah, it looks like someone else already did most of the work!

I can actually just set the puppetmaster hiera variable (*not* the puppetmaster ldap variable!) and get my normal instance talking to a role::puppet::self puppetmaster, without needing to apply role::puppet::self. It does require that I manually cleanup the old certificates, so that needs fixing. This is also potentially security sensitive, since fundamentally this is just a "Your server's certificate has changed! Is this legit?" issue. A couple of options:

  1. Require manual (but scripted) action for this anyway (this is ok in tools' circumstance, since we also puppet sign manually).
  2. Add a $auto_puppetmaster_switching hiera boolean setting that adds a check to puppet-run that checks to see if the ca matches the puppetmaster. If not, clean out the old ssldir, and proceed to run puppet (which will then act like a fresh install and request certs again). Need to figure out if this can be opt-in or opt-out based on the security issues that need to be dealt with wrt just cleaning out an old cert...

In addition, this requires setting a hiera variable, which is painful right now in some cases due to lack of role based hiera lookup in labs. I've filed T120165 for that.

(I have tested both 1 and 2 and they both work)

chasemp triaged this task as Medium priority.Dec 3 2015, 2:22 PM
chasemp set Security to None.

Change 256890 had a related patch set uploaded (by Yuvipanda):
base: Allow auto puppetmaster switching tuning

https://gerrit.wikimedia.org/r/256890

Once ^ gets merged, I'll have to find list of all instances that have role::puppet::self *and* the puppetmaster variable or hiera defined, and fix them all.

Change 257171 had a related patch set uploaded (by Yuvipanda):
k8s: Stop including puppet::self related classes

https://gerrit.wikimedia.org/r/257171

Change 257171 merged by Yuvipanda:
k8s: Stop including puppet::self related classes

https://gerrit.wikimedia.org/r/257171

hashar renamed this task from Kill the 'puppet' module with fire, make self hosted puppetmasters use the puppetmaster module to Phase out the 'puppet' module with fire, make self hosted puppetmasters use the puppetmaster module.Jun 17 2016, 8:38 PM

I'm going to pick up *some* of this next week, specifically stop allowing using role::puppet::self from a purely 'client' configuration. This involves:

  1. Finding all projects that do this
  2. Manually cherry-pick a patch on their puppetmasters
  3. Run puppet, make sure it's using code from base/puppet rather than puppet/
  4. Make sure certs and stuff is properly setup (/var/lib/puppet/ssl rather than /var/lib/puppet/client/ssl)
  5. When all of this is done, merge said patch
  6. Remove code from role::puppet::self that allows the $master to be set to anything other than itself.

This means that if people want to point a random labs instance to another puppetmaster, they can just set the 'puppetmaster' hiera variable, re-sign certs, and are good to go. No need for role::puppet::self.

Projects that need to be touched:

  1. deployment-prep
  2. integration
  3. mailman
  4. wikidata-query
  5. striker
  6. servermon
  7. toolsbeta
  8. shinken
  9. etcd
  10. couple instances in analytics.

I moved striker, and @bd808 reports it is all good. the instance with this issue in the shinken project (shinken-02) has actually been deleted. The 'mailman' project is completely empty.

servermon is done. I'll do the analytics ones next.

The analytics project doesn't actually seem to have any! These were just stale LDAP entries for instances that no longer exist. I've deleted them.

did toolsbeta, or at least the toolsbeta instances that are still sshable. Many didn't come back up after a restart, I'm going to slowly back away from that and let it be.

So it seems the list is now:

  • deployment-prep
  • integration
  • wikidata-query
  • etcd

The analytics project doesn't actually seem to have any! These were just stale LDAP entries for instances that no longer exist. I've deleted them.

T134025 ?

Mentioned in SAL [2016-08-30T20:52:07Z] <yuvipanda> cherry-picked appropriate patch on deployment-puppetmaster for T120159, did https://wikitech.wikimedia.org/w/index.php?title=Hiera:Deployment-prep/host/deployment-puppetmaster&oldid=818847 to make sure the puppetmaster allows connections from elsewhere

deployment-prep is done! \o/

None of the instances have duplicates in their puppet.conf either! \o/

Only things left are etcd and integration.

Hi @Joe. Do you still intend on using the etcd project as is? I tried looking through it, puppet has been disabled in a bunch of instances and in some (the master) puppet was stuck for a few months.

Change 311163 had a related patch set uploaded (by Yuvipanda):
labs: Add a standalone puppetmaster role

https://gerrit.wikimedia.org/r/311163

Change 311163 merged by Yuvipanda:
labs: Add a standalone puppetmaster role

https://gerrit.wikimedia.org/r/311163

Change 307656 had a related patch set uploaded (by Yuvipanda):
wmflib: Return default dir when role::puppet::self isn't used

https://gerrit.wikimedia.org/r/307656

Change 307656 merged by Yuvipanda:
wmflib: Return default dir when role::puppet::self isn't used

https://gerrit.wikimedia.org/r/307656

So, https://wikitech.wikimedia.org/wiki/Standalone_puppetmaster exists now. It is based off the puppetmaster module, and the same role is used for the labs puppetmaster as well! Some more cleanup needs to happen, but next steps are:

  1. Minor code cleanup!
  2. Move deployment-prep to standalone puppetmaster (this might require dealing with confd)
  3. Move tools to standalone puppetmaster
  4. Find other places role::puppet::self is being used as a projectwide puppetmaster, and move those
  5. Test and document how to setup standalone puppetmaster for easy development
  6. Mark role::puppet::self as deprecated
  7. ????
  8. PROFIT!

On my self-hosted puppetmaster using role::puppet::self I've ended up having two stanzas for [agent], one pointing to labs-puppetmaster-eqiad.wikimedia.org and the second one pointing to localhost. After unspeakable changes to puppet.conf and the various hiera attributes I've messed up the instance to the point of deciding to start again from scratch with role::puppetmaster::standalone.

I've unsuccessfully tried to make my new self-hosted puppetmaster use role::puppetmaster::standalone. Here is the story of my attempt.

  1. New medium jessie instance created on https://wikitech.wikimedia.org/wiki/Special:NovaInstance under the puppet project.
  2. Made sure no weird attribute was set under https://wikitech.wikimedia.org/wiki/Hiera:Puppet/host/<instance-name>
  3. Logged into the instance, puppet agent -t till the run resulted in a noop
  4. Configured the instance on wikitech by clicking on 'configure'. Selected role::puppetmaster::standalone.
  5. Once again, made sure puppet did its thing
  6. Set role::puppetmaster::standalone::autosign: true, puppet agent -t
  7. Set puppetmaster: hostname --fqdn in hiera
  8. mv /var/lib/puppet/ssl /var/tmp/
  9. puppet agent -t, resulting in "Error: Could not request certificate: The certificate retrieved from the master does not match the agent's private key"
  10. puppet suggested to remove the certificate from both the master and the agent and run puppet again. This resulted in a new cert being created and puppet failing with: Error: Could not request certificate: SSL_connect returned=1 errno=0 state=error: certificate verify failed: [certificate revoked for /CN=puppet-ema.puppet.eqiad.wmflabs]
  11. Useless random sequence of puppet cert clean $fqdn and new cert requests
  12. This comment

On my self-hosted puppetmaster using role::puppet::self I've ended up having two stanzas for [agent], one pointing to labs-puppetmaster-eqiad.wikimedia.org and the second one pointing to localhost.

Sounds like T132689: /etc/puppet/puppet.conf keeps getting double content - first for labs-wide puppetmaster, then for the correct puppetmaster

When I tried to use role::puppetmaster::standalone last week I was unable to start apache on the new puppetmaster; it complained about missing cert files.

Change 315526 had a related patch set uploaded (by Yuvipanda):
horizon: Show role::puppetmaster::standalone instead of role::puppet::self

https://gerrit.wikimedia.org/r/315526

Change 315564 had a related patch set uploaded (by Yuvipanda):
puppet: Add a message announcing deprecation to role::puppet::self

https://gerrit.wikimedia.org/r/315564

Change 315526 merged by Andrew Bogott:
horizon: Show role::puppetmaster::standalone instead of role::puppet::self

https://gerrit.wikimedia.org/r/315526

Change 315564 merged by Andrew Bogott:
puppet: Add a message announcing deprecation to role::puppet::self

https://gerrit.wikimedia.org/r/315564

The issues with role::puppetmaster::standalone not being able to be its own client are fixed now! http://wikitech.wikimedia.org/wiki/Standalone_puppetmaster has been updated

Change 256890 merged by Andrew Bogott:
base: Allow auto puppetmaster switching tuning

https://gerrit.wikimedia.org/r/256890

Change 324610 had a related patch set uploaded (by Andrew Bogott):
base: Allow auto puppetmaster switching tuning

https://gerrit.wikimedia.org/r/324610

Change 324610 merged by Andrew Bogott:
base: Allow auto puppetmaster switching tuning

https://gerrit.wikimedia.org/r/324610

The puppet module is still present - although documentaiton has been updated to point people to the puppetmaster module based role instead. Not sure if we can just close this, or wait for it to actually die...