Page MenuHomePhabricator

Migrate remaining self-hosted puppet masters to Puppet 5 / facter 3
Closed, ResolvedPublic

Description

@Krenair migrated the main puppet masters for Cloud VPS to Puppet 5 (thanks!) and these self-hosted puppet masters are still using Puppet 4 and will break when Puppet 5-specific code gets merged to puppet.git (or when code gets merged which is buggy with < 5).

They need to be replaced with an equivalent VM running Buster (which has Puppet 5 by default) or removed if obsolete. If no puppetdb is used, they can also be upgraded to Puppet 5 by using component/puppet5 and component/facter3.

  • af-puppetmaster02.automation-framework.eqiad.wmflabs - jessie! handled in T236582: "automation-framework" Cloud VPS project jessie deprecation [deleted]
  • clouddb-services-puppetmaster-01.clouddb-services.eqiad.wmflabs - (upgraded puppet/facter)
  • cloudinfra-internal-puppetmaster01.cloudinfra.eqiad.wmflabs - @Krenair working on it - shut off, ready to terminate on 11th March
  • cloudstore-puppetmaster-01.cloudstore.eqiad.wmflabs - (rebuilt on buster)
  • debmonitor-pm.sso.eqiad.wmflabs - @MoritzMuehlenhoff will handle it
  • deployment-dumps-puppetmaster02.deployment-prep.eqiad.wmflabs - @Krenair working on it - shut off, ready to terminate on 14th March
  • filippo-log-stretch01.logging.eqiad.wmflabs (deleted)
  • icingaduty-puppetmaster-1.icingaduty.eqiad.wmflabs - (removed)
  • integration-puppetmaster01.integration.eqiad.wmflabs - jessie! - T236576: Move all Wikimedia CI (WMCS integration project) instances from jessie to stretch - @Krenair working on it - shut off, ready to terminate on 12th March
  • jbond-stretch-pm.puppet.eqiad.wmflabs - deliberately left on for now to test changes affecting old puppetmasters
  • jeh-puppetmaster.testlabs.eqiad.wmflabs (rebuilt on buster)
  • keith-puppetmaster.puppet.eqiad.wmflabs - (removed)
  • keith-puppetmaster1.puppet.eqiad.wmflabs - (removed)
  • maps-puppetmaster.maps.eqiad.wmflabs - does @TheDJ know anything about this?
  • openstack-puppetmaster-01.openstack.eqiad.wmflabs - (removed)
  • paws-puppetmaster-01.paws.eqiad.wmflabs - (upgraded puppet/facter)
  • puppet-lta.lta-tracker.eqiad.wmflabs - @Zppix
  • puppet-paladox.git.eqiad.wmflabs - @Paladox?
  • puppet-phabricator.phabricator.eqiad.wmflabs - @Paladox?
  • shinken-puppetmaster-01.shinken.eqiad.wmflabs - (rebuilt on buster)
  • toolsbeta-puppetmaster-02.toolsbeta.eqiad.wmflabs - cloud-services-team - @Krenair working on it - shut off, ready to terminate on 26th March

Related Objects

StatusSubtypeAssignedTask
ResolvedMoritzMuehlenhoff
Resolved crusnov
ResolvedKrenair
ResolvedJdforrester-WMF
Resolvedhashar
ResolvedJdforrester-WMF
Resolvedhashar
Resolvedhashar
Resolvedhashar
Resolvedhashar
Resolvedjeena
ResolvedJdforrester-WMF
ResolvedJdrewniak
ResolvedMstyles
ResolvedJdforrester-WMF
ResolvedJdforrester-WMF
ResolvedJdforrester-WMF
Resolvedaborrero

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
herron triaged this task as Medium priority.Jan 3 2020, 7:43 PM

Worth noting there's a small army of random puppetmasters laying around running puppet 4:

  • jeh-puppetmaster.testlabs.eqiad.wmflabs
  • integration-puppetmaster01.integration.eqiad.wmflabs - jessie!
  • puppet-paladox.git.eqiad.wmflabs
  • icingaduty-puppetmaster-1.icingaduty.eqiad.wmflabs
  • af-puppetmaster02.automation-framework.eqiad.wmflabs - jessie!
  • keith-puppetmaster1.puppet.eqiad.wmflabs
  • cloudinfra-internal-puppetmaster01.cloudinfra.eqiad.wmflabs - edit: working on it
  • filippo-log-stretch01.logging.eqiad.wmflabs
  • cloudstore-puppetmaster-01.cloudstore.eqiad.wmflabs
  • puppet-lta.lta-tracker.eqiad.wmflabs
  • deployment-dumps-puppetmaster02.deployment-prep.eqiad.wmflabs - this will be either me or Ariel
  • puppet-phabricator.phabricator.eqiad.wmflabs
  • shinken-puppetmaster-01.shinken.eqiad.wmflabs
  • maps-puppetmaster.maps.eqiad.wmflabs
  • jbond-stretch-pm.puppet.eqiad.wmflabs
  • debmonitor-pm.sso.eqiad.wmflabs
  • toolsbeta-puppetmaster-02.toolsbeta.eqiad.wmflabs
  • openstack-puppetmaster-01.openstack.eqiad.wmflabs
  • paws-puppetmaster-01.paws.eqiad.wmflabs
  • keith-puppetmaster.puppet.eqiad.wmflabs
  • clouddb-services-puppetmaster-01.clouddb-services.eqiad.wmflabs

Maybe I was too quick to merge this and instead this is Migrate all Cloud VPS puppetmasters to Puppet 5 / facter 3, with the other two tasks (and others created for this list) being subtasks?

Maybe I was too quick to merge this and instead this is Migrate all Cloud VPS puppetmasters to Puppet 5 / facter 3, with the other two tasks (and others created for this list) being subtasks?

Or we move that to a separate task? From my PoV it's mostly on whoever creates a self-hosted puppet masters to catch up with large scale changes to the "upstream" puppet masters, so creating a tracking task and CCing people to the task should be enough?

Sure, let's make this the tracking task? Or do you think we should have a separate task to track custom puppetmasters etc.?

MoritzMuehlenhoff updated the task description. (Show Details)

Sure, let's make this the tracking task? Or do you think we should have a separate task to track custom puppetmasters etc.?

Ack, I turned that into a tracking task now

I think we can let people hand it as they prefer it workflow-wise, for some more complex migrations it might be useful to rather create a subtask, but some are also very likely easy wins (obsolete VMs that people simly forgot to remove etc.) which can be resolved without a sub task.

MoritzMuehlenhoff renamed this task from Migrate Cloud VPS to Puppet 5 / facter 3 to Migrate remaining self-hosted puppet masters to Puppet 5 / facter 3.Mar 3 2020, 10:53 AM

in relation to jbond-stretch-pm.puppet.eqiad.wmflabs, this is avalible so i can continue to test changes work on puppet version 4. once everything elses has been migrated/updated this can be deleted without issue

Krenair updated the task description. (Show Details)
Krenair added a subscriber: herron.

Mentioned in SAL (#wikimedia-cloud) [2020-03-04T22:33:31Z] <Krenair> Shutoff cloudinfra-internal-puppetmaster01, replaced with -02 per T241719

@ArielGlenn Hey, do you still need a separate puppetmaster (deployment-dumps-puppetmaster02) for deployment-snapshot01, distinct from the usual deployment-prep puppetmaster (now deployment-puppetmaster04)? If so I'll replace it with a deployment-dumps-puppetmaster03 running buster, but if not I'd like to get rid of it. From what I can tell it just has much fewer cherry-picks?

Hey @Krenair, I don't have to have it right now, but I might need it again in the future. Basically I used it when we were doing transitioning from hhvm to php or during testing serious refactoring of the mediawiki puppet module. I don't imagine having to go through something like that again for some time... but who knows. Famous last words and all that.

Hey @Krenair, I don't have to have it right now, but I might need it again in the future. Basically I used it when we were doing transitioning from hhvm to php or during testing serious refactoring of the mediawiki puppet module. I don't imagine having to go through something like that again for some time... but who knows. Famous last words and all that.

Yeah, I can see that being a use case for a separate puppetmaster. I think in this case I'll probably move deployment-snapshot to the normal deployment-prep puppetmaster and get rid of deployment-dumps-puppetmaster02. Then if and when a new such need emerges we can make a buster (or $current_latest_OS) puppetmaster for it.

Mentioned in SAL (#wikimedia-cloud) [2020-03-06T19:24:43Z] <jeh> create new puppetmaster cloudstore-puppetmaster-02 T241719

Mentioned in SAL (#wikimedia-cloud) [2020-03-06T20:39:11Z] <jeh> delete old puppetmaster cloudstore-puppetmaster-01 T241719

Mentioned in SAL (#wikimedia-cloud) [2020-03-06T21:09:56Z] <jeh> create new puppetmaster shinken-puppetmaster-02 T241719

Mentioned in SAL (#wikimedia-cloud) [2020-03-06T22:15:28Z] <jeh> migrate existing VMs to new shinken-puppetmaster-02 (local commits restored from shinken-puppetmaster-01) T241719

Mentioned in SAL (#wikimedia-cloud) [2020-03-06T22:17:46Z] <jeh> delete shinken-puppetmaster-01 T241719

@aborrero: Hi, do you still need openstack-puppetmaster-01? It's not got any cherry-picks in operations/puppet or labs/private, it's not got anything in volatile, it's not autosigning but all certs that have been signed on it (other than its own) are for instances that no longer appear to be in DNS.

Krenair updated the task description. (Show Details)

Mentioned in SAL (#wikimedia-releng) [2020-03-17T19:49:34Z] <James_F> Deleted deployment-dumps-puppetmaster02 for T241719

Mentioned in SAL (#wikimedia-cloud) [2020-03-19T23:18:12Z] <Krenair> Shut down toolsbeta-puppet(db-01|master-02) - T241719

Mentioned in SAL (#wikimedia-cloud) [2020-03-20T14:03:36Z] <jeh> upgrade paws-puppetmaster-01 to v5 T241719

Mentioned in SAL (#wikimedia-cloud) [2020-03-20T14:58:33Z] <jeh> delete cloudvps project and VMs T241719

@Krenair puppet is broken in toolsbeta with

Error: Could not retrieve catalog from remote server: Error 500 on SERVER: Server Error: Failed to execute '/pdb/cmd/v1?checksum=e81e3916522058b68262a32c3680af5185c63f4d&version=5&certname=toolsbeta-puppetdb-02.toolsbeta.eqiad.wmflabs&command=replace_facts&producer-timestamp=2020-03-31T23:04:34.942Z' on at least 1 of the following 'server_urls': https://toolsbeta-puppetdb-02.toolsbeta.eqiad.wmflabs

I haven't followed what's going on with this well enough to know if there's a quick fix for that.

Ah puppetdb service isn't working. That I might be able to fix.

restarting the service got it running, checking some things.

Mar 31 19:53:09 toolsbeta-puppetdb-02 kernel: [1624858.205612] oom_reaper: reaped process 17294 (java), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB

It was an oom

This is probably the OOM problem that's been affecting deployment-prep. I
think I made a task for that somewhere...

This is an m1.small. Maybe the instance size is just too low for recent versions?

This is an m1.small. Maybe the instance size is just too low for recent versions?

I can tell the toolsbeta puppetmaster is very slow and has been so for a while. So I guess yes, instance size is too small. Could benefit from at least 2 vCPUs.

This is an m1.small. Maybe the instance size is just too low for recent versions?

Probably, for reference the production puppetdb (with 1.5k hosts) uses 7-8 GB RAM.

Someone can remove the puppetmaster of maps, if they can restore the proper puppetmaster on the maps-tiles servers. I tried setting that stuff up at some point, but got completely lost and ran out of time.

MoritzMuehlenhoff claimed this task.
MoritzMuehlenhoff updated the task description. (Show Details)

Closing this task, puppet 5 and facter 3 have since been folded into the "main" component of the repo