Page MenuHomePhabricator

Standalone puppet masters are broken (uninstallable packages)
Closed, ResolvedPublic

Description

I tried to create a standalone puppetmaster following the docs from https://wikitech.wikimedia.org/wiki/Standalone_puppetmaster:

After applying role::puppetmaster::standalone, the puppet run fails with:

Info: Retrieving pluginfacts
Info: Retrieving plugin
Info: Loading facts
Info: Caching catalog for puppet-jmm-puppet-master.puppet.eqiad.wmflabs
Info: Applying configuration version '1491574126'
Error: Execution of '/usr/bin/apt-get -q -y -o DPkg::Options::=--force-confold install puppetmaster' returned 100: Reading package lists...
Building dependency tree...
Reading state information...
Some packages could not be installed. This may mean that you have
requested an impossible situation or if you are using the unstable
distribution that some required packages have not yet been created
or been moved out of Incoming.
The following information may help to resolve the situation:

The following packages have unmet dependencies:
 puppetmaster : Depends: puppet-master but it is not going to be installed
E: Unable to correct problems, you have held broken packages.
Error: /Stage[main]/Puppetmaster/Package[puppetmaster]/ensure: change from purged to present failed: Execution of '/usr/bin/apt-get -q -y -o DPkg::Options::=--force-confold install puppetmaster' returned 100: Reading package lists...
Building dependency tree...
Reading state information...
Some packages could not be installed. This may mean that you have
requested an impossible situation or if you are using the unstable
distribution that some required packages have not yet been created
or been moved out of Incoming.
The following information may help to resolve the situation:

The following packages have unmet dependencies:
 puppetmaster : Depends: puppet-master but it is not going to be installed
E: Unable to correct problems, you have held broken packages.
Notice: Finished catalog run in 11.53 seconds

It seems some of the involved packages are now uninstallable:

jmm@puppet-jmm-puppet-master:~$ sudo apt-get install puppet-master
Reading package lists... Done
Building dependency tree
Reading state information... Done
Some packages could not be installed. This may mean that you have
requested an impossible situation or if you are using the unstable
distribution that some required packages have not yet been created
or been moved out of Incoming.
The following information may help to resolve the situation:

The following packages have unmet dependencies:
 puppet-master : Depends: puppet (= 4.8.2-3~bpo8+1) but it is not going to be installed
E: Unable to correct problems, you have held broken packages.
jmm@puppet-jmm-puppet-master:~$ sudo apt-get install puppet-master puppet
Reading package lists... Done
Building dependency tree
Reading state information... Done
Some packages could not be installed. This may mean that you have
requested an impossible situation or if you are using the unstable
distribution that some required packages have not yet been created
or been moved out of Incoming.
The following information may help to resolve the situation:

The following packages have unmet dependencies:
 puppet : Breaks: facter (< 2.4.0~) but 2.2.0-1 is to be installed
E: Unable to correct problems, you have held broken packages.

Details

Related Gerrit Patches:

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptApr 7 2017, 2:19 PM
Andrew claimed this task.Apr 7 2017, 3:56 PM
Andrew triaged this task as Medium priority.
Andrew added a comment.Apr 7 2017, 5:05 PM

This is from the pinning in https://gerrit.wikimedia.org/r/#/c/300870/ -- jessie-backports has upgraded their puppet package which causes conflicts.

Andrew added a comment.Apr 7 2017, 5:15 PM

We can add 3.8 packages to reprepro but that will affect puppet clients as well as masters, which we might not want.

Andrew added subscribers: Joe, faidon.Apr 7 2017, 5:15 PM

My suggestion, which needs a little more time to be fully tested is:

  • Take the latest 3.8 jessie-backport (from snapshot.debian.org), 3.8.5-2~bpo8+1, and put it in jessie-wikimedia
  • Upgrade all of the trusty-wikimedia fleet to 3.8.5-2~bpo8trusty+2 (already in trusty-wikimedia)
  • Upgrade the whole jessie-wikimedia fleet to run 3.8.5-2~bpo8+1 (now in jessie-wikimedia).
  • (stretch-wikimedia already has 3.8.5-2)

This means that we're going to be running 3.8.5 everywhere, agents and masters. This should work -and it does work on the stretch machines- but needs a little care.

In the meantime, the stopgap solution would be to include 3.8.5-2~bpo8+1 in jessie-wikimedia but under the experimental section and only install that on the puppetmasters. I'd prefer it if we didn't go with that, unless the 3.7->3.8 upgrade path is too complicated. @akosiaris @Joe, thoughts?

Paladox added a subscriber: Paladox.Apr 7 2017, 5:18 PM
Joe added a comment.Apr 7 2017, 5:21 PM

Our production puppetmasters run on 3.8, several clients have been tested, and the agent should have minimal differences.

I can take a look back at the changelog for confirmation, but I'm pretty confident it's a safe upgrade for the agents.

My suggestion, which needs a little more time to be fully tested is:

  • Take the latest 3.8 jessie-backport (from snapshot.debian.org), 3.8.5-2~bpo8+1, and put it in jessie-wikimedia

There are a number of clients (a quick lookup in the mw fleet shows mw1168, mw1169, mw2118, mw2119) already running 3.8 and the versions are compatible. We should be able to do that with no repercussions.

  • Upgrade all of the trusty-wikimedia fleet to 3.8.5-2~bpo8trusty+2 (already in trusty-wikimedia)

I think this would only minimize the dispersion between versions in our installed base. As in, it's not required, but would be nice to do it.

  • Upgrade the whole jessie-wikimedia fleet to run 3.8.5-2~bpo8+1 (now in jessie-wikimedia).

ditto.

  • (stretch-wikimedia already has 3.8.5-2)

This means that we're going to be running 3.8.5 everywhere, agents and masters. This should work -and it does work on the stretch machines- but needs a little care.
In the meantime, the stopgap solution would be to include 3.8.5-2~bpo8+1 in jessie-wikimedia but under the experimental section and only install that on the puppetmasters. I'd prefer it if we didn't go with that, unless the 3.7->3.8 upgrade path is too complicated. @akosiaris @Joe, thoughts?

I see no reason for using experimental, let's just put it under main.

I 'll do the first task from above (get the 3.8 bpo and put it in jessie-wikimedia/main) and possibly do the puppet upgrade on the jessie hosts.

The package has been uploaded on jessie-wikimedia/backports as of a while ago and some basic checks seem to be fine.

For what is worth puppet-master is a new package name (it's only present in jessie-backports). It's not compatible with our current infrastructure (neither the puppetmaster, nor the clients, as the description points out), so while it is confusing as a name, it should be avoided.

Package: puppet-master
Source: puppet
Version: 4.8.2-3~bpo8+1
Installed-Size: 97
Maintainer: Puppet Package Maintainers <pkg-puppet-devel@lists.alioth.debian.org>
Architecture: all
Replaces: puppetmaster (<< 4.8.1-3)
Depends: init-system-helpers (>= 1.18~), puppet (= 4.8.2-3~bpo8+1), ruby | ruby-interpreter, lsb-base (>= 3.0-6)
Breaks: puppetmaster (<< 4.8.1-3)
Description-en: configuration management system, master service
 Puppet is a configuration management system that allows you to define
 the state of your IT infrastructure, then automatically enforces the
 correct state.
 .
This package contains the "puppet-master" service and init script,
 which is suitable for small deployments. For larger deployments or if you wish
 to support Puppet 3 clients, puppet-master-passenger should be used instead.

Mentioned in SAL (#wikimedia-operations) [2017-04-11T13:53:21Z] <akosiaris> upgrade puppet agent to 3.8 across the jessie fleet. Do that in a stages, starting with parsoid hosts. move on to mw fleet next. T162462

The upgrade went fine on all the jessie hosts, now looking into how easy is to do trusty as well.

I am commenting this here, please tell me if completely unrelated and I will create a new ticket:

db1090 keeps failing to run puppet according to icinga since April 11, 2017 14:00, flopping every time. But I cannot reproduce when run manually. Probably other hosts, too. It is a jessie host.

Fixing the puppetmaster issue requires changing (well, removing) the pinning in the puppet manifest, right? Is there a reason not to do that right away?

I am commenting this here, please tell me if completely unrelated and I will create a new ticket:
db1090 keeps failing to run puppet according to icinga since April 11, 2017 14:00, flopping every time. But I cannot reproduce when run manually. Probably other hosts, too. It is a jessie host.

Hm, I see what you mean.

[Wed Apr 12 06:09:18 2017] SERVICE ALERT: db1090;puppet last run;CRITICAL;HARD;3;CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[Wed Apr 12 06:09:18 2017] SERVICE NOTIFICATION: irc;db1090;puppet last run;CRITICAL;notify-service-by-irc;CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[Wed Apr 12 06:22:18 2017] SERVICE ALERT: db1090;puppet last run;OK;HARD;3;OK: Puppet is currently enabled, last run 42 seconds ago with 0 failures

Looking into it

I am commenting this here, please tell me if completely unrelated and I will create a new ticket:
db1090 keeps failing to run puppet according to icinga since April 11, 2017 14:00, flopping every time. But I cannot reproduce when run manually. Probably other hosts, too. It is a jessie host.

Hm, I see what you mean.

[Wed Apr 12 06:09:18 2017] SERVICE ALERT: db1090;puppet last run;CRITICAL;HARD;3;CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[Wed Apr 12 06:09:18 2017] SERVICE NOTIFICATION: irc;db1090;puppet last run;CRITICAL;notify-service-by-irc;CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[Wed Apr 12 06:22:18 2017] SERVICE ALERT: db1090;puppet last run;OK;HARD;3;OK: Puppet is currently enabled, last run 42 seconds ago with 0 failures

Looking into it

It has stopped happening since those last lines I 've pasted above (something by cron? logrotate?). I 'll keep an eye for it though

Mentioned in SAL (#wikimedia-operations) [2017-04-12T09:12:20Z] <_joe_> copying data from / to the neww partition on ocg1003 T162462

Joe added a comment.Apr 12 2017, 9:42 AM

It has stopped happening since those last lines I 've pasted above (something by cron? logrotate?). I 'll keep an eye for it though

There was a rogue puppet agent running in background on that server and others, started via a typo when puppet 3.7 was installed. So every 30 minutes, that agent tried to run puppet and failed, then when the cron ran, it caused the recovery, hence the flapping.

It has stopped happening since those last lines I 've pasted above (something by cron? logrotate?). I 'll keep an eye for it though

There was a rogue puppet agent running in background on that server and others, started via a typo when puppet 3.7 was installed. So every 30 minutes, that agent tried to run puppet and failed, then when the cron ran, it caused the recovery, hence the flapping.

ah indeed. SAL says:

06:28 _joe_: killing long-running puppet-agent on db2058 too
06:20 _joe_: killing badly-started puppet agents on mc1010, tempdb2001,db1090, db2058, hydrogen, possibly others later
06:28 _joe_: killing long-running puppet-agent on db2058 too
06:20 _joe_: killing badly-started puppet agents on mc1010, tempdb2001,db1090, db2058, hydrogen, possibly others later

OK, that solves it. thanks

Change 347825 had a related patch set uploaded (by Alexandros Kosiaris):
[operations/puppet@production] puppetmaster: Remove the jessie-backports pinning

https://gerrit.wikimedia.org/r/347825

Change 347825 merged by Alexandros Kosiaris:
[operations/puppet@production] puppetmaster: Remove the jessie-backports pinning

https://gerrit.wikimedia.org/r/347825

Fixing the puppetmaster issue requires changing (well, removing) the pinning in the puppet manifest, right? Is there a reason not to do that right away?

Actually it is required to make sure we don't end up breaking at some point the entire infrastructure. Otherwise this is fixed already. I 've removed the pin in https://gerrit.wikimedia.org/r/347825

Mentioned in SAL (#wikimedia-operations) [2017-04-12T11:02:06Z] <akosiaris> upgrade puppet across the trusty fleet to 3.8. T162462

akosiaris closed this task as Resolved.Apr 12 2017, 11:20 AM

Note that after merging https://gerrit.wikimedia.org/r/347825 removal of /etc/apt/preferences.d/puppet.pref is required. I 've done it in production and ended up not doing it in puppet.

In any case trusty is 3.8 across the fleet now, jessie is 3.8 across the fleet, puppetmasters unbroken and we found some long running processes running due to typos here and there. I 'll announce this a success and resolve.

I'm getting clean puppet runs on labs instances with role::puppetmaster::standalone now. So, the labs case for this looks resolved -- anything else to do here?

I'm getting clean puppet runs on labs instances with role::puppetmaster::standalone now. So, the labs case for this looks resolved -- anything else to do here?

removal of /etc/apt/preferences.d/puppet.pref is required. I 've done it in production and ended up not doing it in puppet.

@Andrew do we need to do a one-time removal of /etc/apt/preferences.d/puppet.pref