Page MenuHomePhabricator

Self hosted puppetmaster is broken
Closed, ResolvedPublic

Description

"Could not find resource 'Exec[compile puppet.conf]' for relationship on 'Class[Puppetmaster::Ssl]' in some places (but not all). limn1 for an example of an instance with error and tools-puppetmaster-01 for a place without.

Event Timeline

yuvipanda raised the priority of this task from to High.
yuvipanda updated the task description. (Show Details)
yuvipanda added projects: SRE, Cloud-Services, Puppet.
yuvipanda added subscribers: yuvipanda, Joe, Milimetric, MaxSem.

Change 255444 had a related patch set uploaded (by MaxSem):
Don't checkout UserDailyContribs

https://gerrit.wikimedia.org/r/255444

I setup a new self hosted puppetmaster environment today and I did not meet this problem.

chasemp claimed this task.
chasemp subscribed.

I setup a new self hosted puppetmaster environment today and I did not meet this problem.

I tried adding role::puppet::self to an existing trusty host and looks like it has worked

Still happening: see limn1 for an example

Still happening: see limn1 for an example

any common thread for broken instances (since it doesn't seem to be universal)?

fgiunchedi lowered the priority of this task from High to Medium.Dec 1 2015, 10:33 AM

seems to be working fine too on a jessie host, I can't see from wikitech what classes are applied to limn1, maybe that has to do with it too?

seems to be working fine too on a jessie host, I can't see from wikitech what classes are applied to limn1, maybe that has to do with it too?

Sadly it's probably because this box is using "ubuntu-12.04-precise (deprecated 2014-04-17)". We migrated to that last time an image was deprecated, I didn't know 12.04 was deprecated too. I can migrate it, it shouldn't be a horrible amount of work, but I'd rather avoid it if it's relatively easy for y'all to fix?

As for what classes are applied, it's just role::puppet::self and role::labs::lvm::srv but it has a good amount of custom configuration.

puppet-test02.maps-team.eqiad.wmflabs is an example of this failure on an up-to-date jessie image.

@MaxSem or @Milimetric I can't access either of those instances, could you add my wikitech user 'Filippo Giunchedi' to the project(admin) ? thanks!

@MaxSem or @Milimetric I can't access either of those instances, could you add my wikitech user 'Filippo Giunchedi' to the project(admin) ? thanks!

scratch that, I can actually login as root

debugged a bit further, e.g. on puppet-test02 I can get past the error by explicitly include base on the node definition in site.pp (since compile puppet.conf is defined there, but also used in modules/puppet/manifests/self/config.pp and modules/base/manifests/puppet/config.pp. What's actually causing it I'm not sure though

I just realized the site.pp comment above is the reason there are problems on the puppet-test02 is having problems. labs have an LDAP enc, reusing site.pp to override/overload the node configuration not only is not supported, it is guaranteed to cause problems

I can't help but wonder whether this is also connected to the limn1 instance. As I see the last commit over there is also messing up with site.pp, importing passwords.pp which it also adds. However, that namespace is also provided by labs/private. I can't help but this that this is connected.

I just realized the site.pp comment above is the reason there are problems on the puppet-test02 is having problems. labs have an LDAP enc, reusing site.pp to override/overload the node configuration not only is not supported, it is guaranteed to cause problems

I can't help but wonder whether this is also connected to the limn1 instance. As I see the last commit over there is also messing up with site.pp, importing passwords.pp which it also adds. However, that namespace is also provided by labs/private. I can't help but this that this is connected.

Guys if that instance causes too many problems, I can try to puppetize it so it doesn't need to be self-hosted. Right now it seems there are other instances with problems, but my offer stands, let me know.

puppet-test02 can be considered fixed btw

So was the problem just including things in site.pp vs including it via LDAP?

More precisely it was the fact that the node was defined in both site.pp and LDAP.

bd808 subscribed.

WP:BOLD'ly closing this stale task. The LDAP enc is long gone now.