Page MenuHomePhabricator

Update puppet civicrm-prototype puppetmaster
Closed, ResolvedPublic

Description

All of cloud vps is being upgraded to puppet7 with new puppet infra. Each puppetmaster needs to be replaced with a version 7 puppetmaster, and then VMs upgraded to puppet7.

Your project contains the following v5 puppetmaster:

puppetserver.civicrm-prototype.eqiad1.wikimedia.cloud

Please take a moment to consider whether or not you stlil need this project puppetmaster. If you do, migrate with the following steps. Do not hesitate to ask for help from @Andrew or @taavi on IRC if you run into trouble.

In order to migrate:

  1. Make sure you have available quota to create a new g3.cores1.ram2.disk20 VM. If you need more space please open a quota ticket.
  1. Create a 5GB cinder volume (named <projectname>-puppetserver or similar) and mount it as /srv on the existing puppetmaster. Them on the existing puppetmaster:
$ sudo cp -a /var/lib/git /srv
$ mkdir /srv/puppet
$ sudo cp -a /var/lib/puppet/server /srv/puppet
  1. Unmount and detach the cinder volume
  2. Create a new VM for the v7 puppet server, using a flavor with at least 2GB of RAM and Debian Bookworm and a name with 'puppetserver' in it (rather than the deprecated 'puppetmaster'
  3. Mount the previously-created cinder volume at /srv on the new server
  4. Make the new VM a puppetserver by following directions at https://wikitech.wikimedia.org/wiki/Help:Project_puppetserver#Step_1:_Setup_a_puppetserver.

Puppet classes:

role::puppetserver::cloud_vps_project

hiera:

profile::puppet::agent::force_puppet7: true
puppetmaster: puppet
  1. Adjust ownership on the new puppetserver:
$ sudo chown -R gitpuppet /srv/git; chgrp -R gitpuppet /srv/git
$ sudo chown -R puppet /srv/puppet; chgrp -R puppet /srv/puppet
$ sudo run-puppet-agent; sudo run-puppet-agent
$ sudo systemctl restart puppetserver
$ sudo puppetserver-deploy-code 
  1. Assuming that puppet is now running cleanly on the new puppetserver, move existing VMs to the new host with the hiera setting
puppetmaster: <new puppetserver fqdn>
  1. Finally, update clients of the new puppetserver with the hiera setting
profile::puppet::agent::force_puppet7: true

Debian Buster hosts will complain about not being able to install puppet7 but the warning is harmless for now.

Event Timeline

Thanks for the followup in IRC. We are migrating the service to the Prod VPS in the next week. After that, I'll determine if we still need this puppetserver for testing. If we do, I will follow the instructions to migrate, otherwise I will destroy the host and we will use the standard puppetserver.

Hello! Can I get an update on the status of this work?

We hit a snag with the deployment to prod and are still working through an issue (T363571). I'm hoping that we'll be able to get this sorted at our onsite at the end of this month (Jun 2024) and finish the transition to production. If we can't get it completed then, I'll work through a migration plan to bookworm and to the standard puppetserver for the cloud VPS. Just been holding on hoping we can just move on instead of keeping a custom puppetserver.

Migrated to the production VPS today. Hopefully we will be able to start the rebuild of the cloudVPS host next week to use the standard puppetserver and turn off the custom puppetserver.

Hello! Can I get an update on this?

After some testing (and much distraction by other projects) we're going to upgrade to puppetserver 7 instead of trying to run off the stock puppetserver. I have created the volume, copied the data, and created the new VPS. Will wrap up the work tomorrow.

Hi, sorry about this. I did make progress and then promptly got placed on a jury where I have been for the last month. I'll plan to look at this next week after I recover from triaging the rest of my backlog.

Ok. I think I'm going to need some help at this point. I have gone back to the puppetserver (puppetserver-01.civicrm-prototype) I created a few months ago and am trying to do the full migration. I have verified the steps up to 6 above and all was going well.

When I tried the puppet runs with the cinder volume (attached as /srv), I hit safe directory errors. I brought the old version of the puppet repo (from puppetserver.civicrm-prototype) up to the current production level (with the local modifications in place) and updated the copy on the cinder volume.

Then I started down the path of step 7 again. Updating the permissions ran fine so I ran the step of the two puppet agent runs. With that, I encountered the following error (timestamps and hostname removed from the logs for clarity):

puppet-agent[1901]: Using environment 'production'
puppet-agent[1901]: Retrieving pluginfacts
puppet-agent[1901]: Retrieving plugin
puppet-agent[1901]: Loading facts
puppet-agent[1901]: Caching catalog for puppetserver-01.civicrm-prototype.eqiad1.wikimedia.cloud
puppet-agent[1901]: Applying configuration version '(3e3c605601) Andrew Bogott - puppet-enc: wider use of project names instead of project IDs'
puppet-agent[1901]: (/Stage[main]/Profile::Puppetserver::Git/File[/srv/git/operations/puppet/.git/hooks/post-checkout]/ensure) defined content as '{sha256}be1b3a114c6a361a9898fb1c359766b2d274054cf16e4e6cf3403d817bdcb0a8'
puppet-agent[1901]: (/Stage[main]/Profile::Puppetserver::Git/File[/srv/git/operations/puppet/.git/hooks/post-commit]/ensure) defined content as '{sha256}be1b3a114c6a361a9898fb1c359766b2d274054cf16e4e6cf3403d817bdcb0a8'
puppet-agent[1901]: (/Stage[main]/Profile::Puppetserver::Git/File[/srv/git/operations/puppet/.git/hooks/post-merge]/ensure) defined content as '{sha256}be1b3a114c6a361a9898fb1c359766b2d274054cf16e4e6cf3403d817bdcb0a8'
puppet-agent[1901]: (/Stage[main]/Profile::Puppetserver::Git/Exec[puppetserver-deploy-code]/returns) fatal: detected dubious ownership in repository at '/srv/git/operations/puppet'
puppet-agent[1901]: (/Stage[main]/Profile::Puppetserver::Git/Exec[puppetserver-deploy-code]/returns) To add an exception for this directory, call:
puppet-agent[1901]: (/Stage[main]/Profile::Puppetserver::Git/Exec[puppetserver-deploy-code]/returns) #011git config --global --add safe.directory /srv/git/operations/puppet
puppet-agent[1901]: (/Stage[main]/Profile::Puppetserver::Git/Exec[puppetserver-deploy-code]/returns) ERROR: Unable to obtain the current branch
puppet-agent[1901]: '/usr/local/bin/puppetserver-deploy-code' returned 1 instead of one of [0]
puppet-agent[1901]: (/Stage[main]/Profile::Puppetserver::Git/Exec[puppetserver-deploy-code]/returns) change from 'notrun' to ['0'] failed: '/usr/local/bin/puppetserver-deploy-code' returned 1 instead of one of [0]
puppet-agent[1901]: (/Stage[main]/Profile::Puppetserver::Git/File[/srv/git/labs/private/.git/hooks/post-checkout]/ensure) defined content as '{sha256}be1b3a114c6a361a9898fb1c359766b2d274054cf16e4e6cf3403d817bdcb0a8'
puppet-agent[1901]: (/Stage[main]/Profile::Puppetserver::Git/File[/srv/git/labs/private/.git/hooks/post-commit]/ensure) defined content as '{sha256}be1b3a114c6a361a9898fb1c359766b2d274054cf16e4e6cf3403d817bdcb0a8'
puppet-agent[1901]: (/Stage[main]/Profile::Puppetserver::Git/File[/srv/git/labs/private/.git/hooks/post-merge]/ensure) defined content as '{sha256}be1b3a114c6a361a9898fb1c359766b2d274054cf16e4e6cf3403d817bdcb0a8'
puppet-agent[1901]: (/Stage[main]/Profile::Puppetserver::Git/File[/etc/puppet/private]/ensure) created
puppet-agent[1901]: (Class[Profile::Puppetserver::Git]) Unscheduling all events on Class[Profile::Puppetserver::Git]
puppet-agent[1901]: (/Stage[main]/Puppetserver/Service[puppetserver]) Dependency Exec[puppetserver-deploy-code] has failures: true
puppet-agent[1901]: (/Stage[main]/Puppetserver/Service[puppetserver]) Skipping because of failed dependencies
puppet-agent[1901]: Applied catalog in 8.05 seconds
puppet-agent[2242]: Using environment 'production'
puppet-agent[2242]: Retrieving pluginfacts
puppet-agent[2242]: Retrieving plugin
puppet-agent[2242]: Loading facts

At this point, the puppetserver process has been stopped and will not restart. When I tried to start it I see these errors in the logs:

systemd[1]: Starting puppetserver.service - Puppet Server...
java[2556]: Exception in thread "main" java.lang.IllegalStateException: Unable to borrow JRubyInstance from pool
java[2556]: #011at puppetlabs.services.jruby_pool_manager.impl.jruby_internal$eval25425$borrow_from_pool_BANG__STAR___25430$fn__25431.invoke(jruby_internal.clj:313)
java[2556]: #011at puppetlabs.services.jruby_pool_manager.impl.jruby_internal$eval25425$borrow_from_pool_BANG__STAR___25430.invoke(jruby_internal.clj:300)
java[2556]: #011at puppetlabs.services.jruby_pool_manager.impl.jruby_internal$eval25472$borrow_from_pool_with_timeout__25477$fn__25478.invoke(jruby_internal.clj:348)  
java[2556]: #011at puppetlabs.services.jruby_pool_manager.impl.jruby_internal$eval25472$borrow_from_pool_with_timeout__25477.invoke(jruby_internal.clj:337)
java[2556]: #011at puppetlabs.services.jruby_pool_manager.impl.instance_pool$eval28315$fn__28328.invoke(instance_pool.clj:48)
java[2556]: #011at puppetlabs.services.protocols.jruby_pool$eval26341$fn__26375$G__26318__26382.invoke(jruby_pool.clj:3)
java[2556]: #011at puppetlabs.services.jruby_pool_manager.jruby_core$eval26891$borrow_from_pool_with_timeout__26896$fn__26897.invoke(jruby_core.clj:222)
java[2556]: #011at puppetlabs.services.jruby_pool_manager.jruby_core$eval26891$borrow_from_pool_with_timeout__26896.invoke(jruby_core.clj:209)
java[2556]: #011at puppetlabs.services.config.puppet_server_config_core$eval37399$get_puppet_config__37404$fn__37405$fn__37406.invoke(puppet_server_config_core.clj:107)
java[2556]: #011at puppetlabs.services.config.puppet_server_config_core$eval37399$get_puppet_config__37404$fn__37405.invoke(puppet_server_config_core.clj:107)
java[2556]: #011at puppetlabs.services.config.puppet_server_config_core$eval37399$get_puppet_config__37404.invoke(puppet_server_config_core.clj:102)
java[2556]: #011at puppetlabs.services.config.puppet_server_config_service$reify__37434$service_fnk__5716__auto___positional$reify__37445.init(puppet_server_config_service.clj:25)
java[2556]: #011at puppetlabs.trapperkeeper.services$eval5514$fn__5515$G__5502__5518.invoke(services.clj:9)
java[2556]: #011at puppetlabs.trapperkeeper.services$eval5514$fn__5515$G__5501__5522.invoke(services.clj:9)
java[2556]: #011at puppetlabs.trapperkeeper.internal$eval16416$run_lifecycle_fn_BANG___16423$fn__16424.invoke(internal.clj:196)
java[2556]: #011at puppetlabs.trapperkeeper.internal$eval16416$run_lifecycle_fn_BANG___16423.invoke(internal.clj:179)
java[2556]: #011at puppetlabs.trapperkeeper.internal$eval16445$run_lifecycle_fns__16450$fn__16451.invoke(internal.clj:229)
java[2556]: #011at puppetlabs.trapperkeeper.internal$eval16445$run_lifecycle_fns__16450.invoke(internal.clj:206)
java[2556]: #011at puppetlabs.trapperkeeper.internal$eval17087$build_app_STAR___17096$fn$reify__17108.init(internal.clj:614)
java[2556]: #011at puppetlabs.trapperkeeper.internal$eval17137$boot_services_for_app_STAR__STAR___17144$fn__17145$fn__17147.invoke(internal.clj:648)
java[2556]: #011at puppetlabs.trapperkeeper.internal$eval17137$boot_services_for_app_STAR__STAR___17144$fn__17145.invoke(internal.clj:647)
java[2556]: #011at puppetlabs.trapperkeeper.internal$eval17137$boot_services_for_app_STAR__STAR___17144.invoke(internal.clj:641)
java[2556]: #011at clojure.core$partial$fn__5910.invoke(core.clj:2647)
java[2556]: #011at puppetlabs.trapperkeeper.internal$eval16490$initialize_lifecycle_worker__16501$fn__16502$fn__16665$state_machine__13652__auto____16690$fn__16693.invoke(internal.clj:249)
java[2556]: #011at puppetlabs.trapperkeeper.internal$eval16490$initialize_lifecycle_worker__16501$fn__16502$fn__16665$state_machine__13652__auto____16690.invoke(internal.clj:249)
java[2556]: #011at clojure.core.async.impl.ioc_macros$run_state_machine.invokeStatic(ioc_macros.clj:978)
java[2556]: #011at clojure.core.async.impl.ioc_macros$run_state_machine.invoke(ioc_macros.clj:977)
java[2556]: #011at clojure.core.async.impl.ioc_macros$run_state_machine_wrapped.invokeStatic(ioc_macros.clj:982)
java[2556]: #011at clojure.core.async.impl.ioc_macros$run_state_machine_wrapped.invoke(ioc_macros.clj:980)
java[2556]: #011at clojure.core.async$ioc_alts_BANG_$fn__13899.invoke(async.clj:421)
java[2556]: #011at clojure.core.async$do_alts$fn__13830$fn__13833.invoke(async.clj:288)
java[2556]: #011at clojure.core.async.impl.channels.ManyToManyChannel$fn__7557$fn__7558.invoke(channels.clj:99)
java[2556]: #011at clojure.lang.AFn.run(AFn.java:22)
java[2556]: #011at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
java[2556]: #011at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
java[2556]: #011at clojure.core.async.impl.concurrent$counted_thread_factory$reify__7426$fn__7427.invoke(concurrent.clj:29)
java[2556]: #011at clojure.lang.AFn.run(AFn.java:22)
java[2556]: #011at java.base/java.lang.Thread.run(Thread.java:840)
java[2556]: Caused by: org.jruby.embed.InvokeFailedException: org.jruby.exceptions.RuntimeError: (RuntimeError) Got 1 failure(s) while initializing: File[/srv/puppet_code/environments/production]: change from 'absent' to 'directory' failed: Could not set 'directory' on ensure: Permission denied - /srv/puppet_code/environments/production
java[2556]: #011at org.jruby.embed.internal.EmbedRubyObjectAdapterImpl.doInvokeMethod(EmbedRubyObjectAdapterImpl.java:253)
java[2556]: #011at org.jruby.embed.internal.EmbedRubyObjectAdapterImpl.callMethod(EmbedRubyObjectAdapterImpl.java:162)
java[2556]: #011at org.jruby.embed.ScriptingContainer.callMethod(ScriptingContainer.java:1464)
java[2556]: #011at com.puppetlabs.jruby_utils.jruby.InternalScriptingContainer.callMethodWithArgArray(InternalScriptingContainer.java:43)
java[2556]: #011at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
java[2556]: #011at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
java[2556]: #011at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
java[2556]: #011at java.base/java.lang.reflect.Method.invoke(Method.java:569)
java[2556]: #011at clojure.lang.Reflector.invokeMatchingMethod(Reflector.java:167)
java[2556]: #011at clojure.lang.Reflector.invokeInstanceMethod(Reflector.java:102)
java[2556]: #011at puppetlabs.services.jruby.jruby_puppet_core$eval27353$get_initialize_pool_instance_fn__27358$fn__27359$fn__27360.invoke(jruby_puppet_core.clj:141)
java[2556]: #011at puppetlabs.services.jruby_pool_manager.impl.jruby_internal$eval25225$create_pool_instance_BANG___25234$fn__25237.invoke(jruby_internal.clj:256)
java[2556]: #011at puppetlabs.services.jruby_pool_manager.impl.jruby_internal$eval25225$create_pool_instance_BANG___25234.invoke(jruby_internal.clj:225)
java[2556]: #011at puppetlabs.services.jruby_pool_manager.impl.jruby_agents$eval25651$add_instance__25656$fn__25660.invoke(jruby_agents.clj:52)
java[2556]: #011at puppetlabs.services.jruby_pool_manager.impl.jruby_agents$eval25651$add_instance__25656.invoke(jruby_agents.clj:47)
java[2556]: #011at puppetlabs.services.jruby_pool_manager.impl.jruby_agents$eval25678$prime_pool_BANG___25683$fn__25687.invoke(jruby_agents.clj:76)
java[2556]: #011at puppetlabs.services.jruby_pool_manager.impl.jruby_agents$eval25678$prime_pool_BANG___25683.invoke(jruby_agents.clj:61)
java[2556]: #011at puppetlabs.services.jruby_pool_manager.impl.instance_pool$eval28315$fn__28324$fn__28325.invoke(instance_pool.clj:16)
java[2556]: #011at puppetlabs.trapperkeeper.internal$shutdown_on_error_STAR_.invokeStatic(internal.clj:403)
java[2556]: #011at puppetlabs.trapperkeeper.internal$shutdown_on_error_STAR_.invoke(internal.clj:378)
java[2556]: #011at puppetlabs.trapperkeeper.internal$shutdown_on_error_STAR_.invokeStatic(internal.clj:388)
java[2556]: #011at puppetlabs.trapperkeeper.internal$shutdown_on_error_STAR_.invoke(internal.clj:378)
java[2556]: #011at puppetlabs.trapperkeeper.internal$eval16950$shutdown_service__16955$fn$reify__16957$service_fnk__5716__auto___positional$reify__16962.shutdown_on_error(internal.clj:448)
java[2556]: #011at puppetlabs.trapperkeeper.internal$eval16874$fn__16892$G__16866__16900.invoke(internal.clj:411)
java[2556]: #011at puppetlabs.trapperkeeper.internal$eval16874$fn__16892$G__16865__16909.invoke(internal.clj:411)
java[2556]: #011at clojure.core$partial$fn__5908.invoke(core.clj:2642)
java[2556]: #011at clojure.core$partial$fn__5908.invoke(core.clj:2641)
java[2556]: #011at puppetlabs.services.jruby_pool_manager.impl.jruby_agents$eval25625$send_agent__25630$fn__25631$agent_fn__25632.invoke(jruby_agents.clj:41)
java[2556]: #011at clojure.core$binding_conveyor_fn$fn__5823.invoke(core.clj:2050)
java[2556]: #011at clojure.lang.AFn.applyToHelper(AFn.java:154)
java[2556]: #011at clojure.lang.RestFn.applyTo(RestFn.java:132)
java[2556]: #011at clojure.lang.Agent$Action.doRun(Agent.java:114)
java[2556]: #011at clojure.lang.Agent$Action.run(Agent.java:163)
java[2556]: #011at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
java[2556]: #011at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
java[2556]: #011... 1 more
java[2556]: Caused by: org.jruby.exceptions.RuntimeError: (RuntimeError) Got 1 failure(s) while initializing: File[/srv/puppet_code/environments/production]: change from 'absent' to 'directory' failed: Could not set 'directory' on ensure: Permission denied - /srv/puppet_code/environments/production
java[2556]: #011at RUBY.use(/usr/lib/ruby/vendor_ruby/puppet/settings.rb:1140)
java[2556]: #011at RUBY.apply(/usr/lib/ruby/vendor_ruby/puppet/resource/catalog.rb:248)
java[2556]: #011at RUBY.use(/usr/lib/ruby/vendor_ruby/puppet/settings.rb:1130)
java[2556]: #011at RUBY.initialize_puppet(uri:classloader:/puppetserver-lib/puppet/server/puppet_config.rb:91)
java[2556]: #011at RUBY.initialize(uri:classloader:/puppetserver-lib/puppet/server/master.rb:39)
java[2556]: #011at org.jruby.RubyClass.new(org/jruby/RubyClass.java:911)
systemd[1]: puppetserver.service: Main process exited, code=exited, status=1/FAILURE

I welcome suggestions or assistance here. Thanks.

Hello @Dwisehaupt -- I don't know how things got into this state but I have tried to sort things out a bit. I believe the puppetserver is now running.

One thing I noticed (which may or may not have been your doing) :

  1. many files in /srv/git/operations/puppet were owned by 'puppet' and some were owned by 'gitpuppet'. I chgrp'd things again
  2. Current deployment behavior (which mirrors production puppetservers) limits deployment to a branch named 'production'. I left your local checkout on a branch called 'andrewfounditlikethis' and checked out a production branch

That made the puppetserver. Note that there are now many local changes (visible with 'git status') on the production branch. So the exercise left for you is:

  1. git reset --hard origin in the production branch once you deem that to be safe
  2. cherry pick whatever local diffs you need into that production branch

I hope that makes sense! Let me know if you hit other road bumps.

@Andrew Thanks so much for that. I'm sure some of that was my doing or due to how my workflow was. I'll work through the changes and fix forward from here.

What's the status, is the old puppet 5 puppet master already out of service?

@MoritzMuehlenhoff At this point, the puppet5 puppet master is out of service or at least isn't doing any new updates. It will hopefully be powered down today.

The new puppet 7 puppetserver is up and running. I have built and am testing crm-dev-02 with the new puppetserver and things appear to all be working as desired. Any issues we run across will be fixed forward and I hope to transition the current cloudvps crm test instance to crm-dev-02 in the next few hours assuming the data copies and restores go as planned. Once those are complete and verified, I'll power off the two older instances.

Tested the full restore process on crm-dev-02. Adjusted the web proxy from community-crm to crm-dev-01 and tested basic functionality successfully. Powered off community-crm and puppetserver instances.

Dwisehaupt moved this task from In Progress to Done on the fundraising-tech-ops board.

Closing this out since things have been ok with the new instances.