Page MenuHomePhabricator

Upgrade PuppetDB to version 4.4
Closed, ResolvedPublic

Description

Parent task for PuppetDB upgrade to 4.4.

Tasks

  • Port puppetlabs PuppetDB 4.4 package to stretch T185502
  • Add PuppetDB version selector (puppet/hiera) T185501
  • Extend puppetmaster::puppetdb to support puppetlabs packaged puppetdb 4.4 T185500
  • support puppetdb4 with postgres backend in puppet-compiler T187258
  • Build a pair of debian stretch PuppetDB servers T185499

Related Objects

Event Timeline

herron created this task.Oct 2 2017, 8:50 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptOct 2 2017, 8:50 PM
herron updated the task description. (Show Details)Oct 3 2017, 6:16 PM
Volans added a subscriber: Volans.Jan 9 2018, 10:46 PM
herron added a subscriber: faidon.Jan 10 2018, 9:30 PM

So we’ll need to select a puppetdb version and package to proceed.

Puppetdb 4.4 looks like the version we should target as according to puppetlabs docs it’s the newest release still supporting puppet 4.

In terms of packages, a puppetdb-4.4.1 package was added to Debian unstable in Aug 2017 and has trickled into the Ubuntu universe. On a testing stretch instance this package pulled 162 dependencies from unstable. Not ideal. But at the same time seems preferable over starting new packages for the same puppetdb version from scratch.

Thoughts?

Spoke with Faidon about this a bit. There are some issues with the unstable puppetdb-4.4.1 package that must be fixed before it transitions into testing, etc. Since the timeframe for that is not known we'll try using the puppetlabs puppetdb packages.

In the puppetlabs apt repo there appear to only be puppetdb 5 packages (which is not compatible with puppet 4) for stretch. Many puppetdb versions are packaged for jessie. Hopefully the jessie packages can be massaged to run on stretch, but if that doesn't work we may want to keep puppetdb on jessie for the time being.

The puppetlabs puppetdb 4.4 jessie package is running successfully in labs on a stretch instance (puppet project).

With a few local modifications puppetdb 4.4 is cooperating with role::puppetmaster::puppetdb (postgres backend, nginx tls frontend, etc) as well.

herron@puppetdb-keith-stretch1:~$facter -p lsbdistcodename
stretch

herron@puppetdb-keith-stretch1:~$ curl -k https://localhost:443/pdb/meta/v1/version
{
  "version" : "4.4.0"
}

Will spin up tasks for changes needed to puppetize puppetdb 4.4 using the puppetlabs package variant.

herron renamed this task from Upgrade puppetDB to version 3.2 or newer to Upgrade PuppetDB to version 4.4.Jan 22 2018, 4:13 PM
herron updated the task description. (Show Details)

Mentioned in SAL (#wikimedia-operations) [2018-02-26T14:50:10Z] <godog> upload puppetdb 4.4.0-1~wmf1 to stretch-wikimedia - T177253

herron updated the task description. (Show Details)Feb 27 2018, 10:10 PM

Mentioned in SAL (#wikimedia-operations) [2018-03-01T15:12:54Z] <godog> upload puppetdb 4.4.0-1~wmf1 to component/puppetdb4 - T177253

Change 417370 had a related patch set uploaded (by Herron; owner: Herron):
[operations/puppet@production] add puppetdb4 support to puppetmaster::puppetdb::client

https://gerrit.wikimedia.org/r/417370

Change 417370 merged by Herron:
[operations/puppet@production] add puppetdb4 support to puppetmaster::puppetdb::client

https://gerrit.wikimedia.org/r/417370

Change 417459 had a related patch set uploaded (by Herron; owner: Herron):
[operations/puppet@production] add puppetdb-termini pacakge require in puppetmaster module

https://gerrit.wikimedia.org/r/417459

Change 417459 merged by Herron:
[operations/puppet@production] fix puppetdb terminus package conflict in ::puppetmaster

https://gerrit.wikimedia.org/r/417459

Mentioned in SAL (#wikimedia-operations) [2018-03-12T12:44:20Z] <godog> start a catalog compilation on elnath to check for puppetdb4 diffs - T177253

Change 420062 had a related patch set uploaded (by Herron; owner: Herron):
[operations/puppet@production] puppetdb_upgrade: point codfw puppet masters to puppetdb2001

https://gerrit.wikimedia.org/r/420062

herron added a comment.EditedMar 16 2018, 4:39 PM

Next week after the codfw puppet masters have been upgraded to stretch I plan to upgrade codfw to puppetdb 4 with this migration plan:

  1. depool codfw puppetmasters (via dns, patch to be created after codfw stretch upgrade complete)
  2. perform puppet --noop runs on all hosts to send up to date facts and catalogs to new puppetdb without applying changes
  3. merge hieradata update for codfw masters puppetmaster2001, puppetmaster2002 (https://gerrit.wikimedia.org/r/#/c/420062/)
  4. run puppet on codfw masters
    • ensure puppetdb-termini 4.4 is installed on puppetmaster2001, puppetmaster2002
    • ensure puppetdb.conf contains server_urls syntax and reflects new puppetdb server
  5. run catalog diffs to ensure codfw masters are working correctly
  6. begin puppet downtime across production realm
  7. disable production realm puppet agents globally
  8. cut over puppetmaster service from eqiad to codfw (via dns, patch to be created after codfw stretch upgrade complete)
  9. merge puppetdbquery patch (https://gerrit.wikimedia.org/r/#/c/410050/)
    • note: must revert this puppetdbquery patch if failback to eqiad necessary
  10. ensure puppetdb host list on nitrogen matches puppetdb host list on puppetdb1001. clean up as needed (hosts likely have been deactivated along the way causing some drift)
  11. gradually re-enable agents, saving larger puppetdb consumers for last (icinga, lvs, cp, prometheus, restbase, redis)
  12. ensure all agents (that weren't disabled for other reasons) are re-enabled via cumin
  13. end puppet downtime
  14. update cumin to use new puppetdb backend

Change 420691 had a related patch set uploaded (by Herron; owner: Herron):
[operations/dns@master] depool codfw puppetmaster (via dns)

https://gerrit.wikimedia.org/r/420691

Mentioned in SAL (#wikimedia-operations) [2018-03-20T13:29:01Z] <herron> depooling codfw puppet masters via dns T177253

Change 420691 merged by Herron:
[operations/dns@master] depool codfw puppetmaster (via dns)

https://gerrit.wikimedia.org/r/420691

Change 420062 merged by Herron:
[operations/puppet@production] puppetdb_upgrade: point codfw puppet masters to puppetdb2001

https://gerrit.wikimedia.org/r/420062

Mentioned in SAL (#wikimedia-operations) [2018-03-20T14:59:57Z] <herron> codfw puppet masters upgraded to puppetdb4. placing puppet agents into icinga downtime and beginning puppet —noop runs (to send facts to new puppetdb) T177253

Change 420827 had a related patch set uploaded (by Herron; owner: Herron):
[operations/dns@master] depool eqiad puppetmaster and point puppet service records to codfw

https://gerrit.wikimedia.org/r/420827

Change 420827 merged by Herron:
[operations/dns@master] depool eqiad puppetmaster and point puppet service records to codfw

https://gerrit.wikimedia.org/r/420827

The steps in https://phabricator.wikimedia.org/T177253#4056981 have been completed and production is now running puppetdb 4.4 from codfw

Here are my notes from the upgrade and current status:

  • Currently codfw puppet masters are live and eqiad masters remain depooled (with the exception of ca/volatile service). We can proceed with puppetdb upgrade in eqiad after puppetmaster1001 is upgraded to stretch.
  • Manual puppet runs (via cumin) from across the fleet (both noop, and normal agent runs) caused the puppetdb command queue to become deeper than puppetdb could cope with. I didn't realize this until later on in the process, of course hindsight being 20/20 this should have been monitored closely throughout to ensure puppetdb was not overwhelmed.
  • The above, in conjunction with hosts having already submitted facts to the new puppetdb servers weeks ago resulted in some confusing behavior. A subset of hosts had ssh host keys reverted to older values, or became empty. However, after clearing the puppetdb command queue host keys were updated during the next puppet run as expected. The command queue settled down and has been sitting at ~0 for the past hour.
  • Hosts with puppet disabled, or with pre-existing run failures, will need a puppet agent run to happen in order to populate the new puppetdb.
  • Einsteinim has been left with puppet disabled for now. As stated above, hosts that have puppet disabled will need to perform at least a noop run in order to send facts to the new puppetdb. It can't be left this way for long, but I think better to temporarily delay host adds while reviewing than to unexpectedly remove hosts from icinga.

Change 420946 had a related patch set uploaded (by Herron; owner: Herron):
[operations/puppet@production] cumin: point cumin master to puppetdb1001

https://gerrit.wikimedia.org/r/420946

Mentioned in SAL (#wikimedia-operations) [2018-03-21T01:51:28Z] <herron> codfw puppetdb upgrade complete. eqiad puppetmaster remains depooled T177253

Change 420946 merged by Volans:
[operations/puppet@production] cumin: point cumin master to puppetdb1001

https://gerrit.wikimedia.org/r/420946

List of hosts with puppet disabled since before the migration, that are missing in the new puppetdb and would disappear from Icinga upon re-enabling puppet there:

labtestcontrol2003.wikimedia.org
labtestneutron2001.codfw.wmnet
labtestneutron2002.codfw.wmnet
labvirt1019.eqiad.wmnet
labvirt1020.eqiad.wmnet
labvirt1022.eqiad.wmnet
lvs1008.eqiad.wmnet
lvs1009.eqiad.wmnet
mwdebug2001.codfw.wmnet

I'll follow up with the owners for re-enabling or making a single puppet run.

Mentioned in SAL (#wikimedia-operations) [2018-03-21T10:16:52Z] <volans> re-enabling puppet on einsteinium (icinga host) see T177253#4067901

herron updated the task description. (Show Details)Mar 21 2018, 2:36 PM

Change 421532 had a related patch set uploaded (by Herron; owner: Herron):
[operations/puppet@production] upgrade eqiad puppet masters to puppetdb4

https://gerrit.wikimedia.org/r/421532

Change 421532 merged by Herron:
[operations/puppet@production] upgrade eqiad puppet masters to puppetdb4

https://gerrit.wikimedia.org/r/421532

herron closed this task as Resolved.Mar 29 2018, 4:43 PM
herron claimed this task.

PuppetDB 4.4 upgrade is complete