upgrade puppetmaster1001 and 2001
Description
Details
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | jbond | T228657 Upgrade Puppet Masters and Puppet DB servers | |||
Resolved | jbond | T234315 upgrade puppet master frontends servers | |||
Resolved | jbond | T234332 ensure additional puppetmaster files are managed by puppet | |||
Resolved | jbond | T235077 apache fails to start on buster to to an SSL error | |||
Declined | None | T235185 missing CRL |
Event Timeline
Change 540096 had a related patch set uploaded (by Jbond; owner: John Bond):
[operations/dns@master] puppet: point puppet,eqsin.wmnet to puppetmaster1001.eqiad.wmnet
Change 540096 merged by Jbond:
[operations/dns@master] puppet: point puppet,eqsin.wmnet to puppetmaster1001.eqiad.wmnet
Change 540100 had a related patch set uploaded (by Jbond; owner: John Bond):
[operations/dns@master] puppet: point puppet.ulsfo.wmnet to puppetmaster1001.eqiad.wmnet
Change 540100 merged by Jbond:
[operations/dns@master] puppet: point puppet.ulsfo.wmnet to puppetmaster1001.eqiad.wmnet
Change 540102 had a related patch set uploaded (by Jbond; owner: John Bond):
[operations/dns@master] puppet: point puppet.codfw.wmnet to puppetmaster1001.eqiad.wmnet
Change 540102 merged by Jbond:
[operations/dns@master] puppet: point puppet.codfw.wmnet to puppetmaster1001.eqiad.wmnet
Change 540130 had a related patch set uploaded (by Jbond; owner: John Bond):
[operations/dns@master] puppetmaster2001: move eqsin back to puppetmaster2001
Change 540133 had a related patch set uploaded (by Jbond; owner: John Bond):
[operations/dns@master] puppetmaster2001: move codfw back to puppetmaster2001
Change 540130 merged by Jbond:
[operations/dns@master] puppetmaster2001: move eqsin back to puppetmaster2001
Change 540133 merged by Jbond:
[operations/dns@master] puppetmaster2001: move codfw back to puppetmaster2001
Change 540367 had a related patch set uploaded (by Jbond; owner: John Bond):
[operations/dns@master] puppetmaster1001: migrate esams puppet traffic to codfw
Change 540368 had a related patch set uploaded (by Jbond; owner: John Bond):
[operations/dns@master] puppetmaster1001: move eqaid puppet to codfw
Change 540367 merged by Jbond:
[operations/dns@master] puppetmaster1001: migrate esams puppet traffic to codfw
Change 540368 merged by Jbond:
[operations/dns@master] puppetmaster1001: move eqaid puppet to codfw
Change 540390 had a related patch set uploaded (by Jbond; owner: John Bond):
[operations/dns@master] puppetmaster1001: update config-manager to prepare for reimage
Change 540392 had a related patch set uploaded (by Jbond; owner: John Bond):
[operations/puppet@production] puppetmaster1001: move ca to puppetmaster2001 for reimage
Change 540393 had a related patch set uploaded (by Jbond; owner: John Bond):
[operations/puppet@production] pybal_config: remove puppetmaster1001 from pybal_config backend
Change 540392 merged by Jbond:
[operations/puppet@production] puppetmaster1001: move ca to puppetmaster2001 for reimage
Change 540390 merged by Jbond:
[operations/dns@master] puppetmaster1001: update config-master to prepare for reimage
Change 540431 had a related patch set uploaded (by Jbond; owner: John Bond):
[operations/puppet@production] puppetmaster_ca: move ca functions to puppetmaster2001
Change 540431 merged by Jbond:
[operations/puppet@production] puppetmaster_ca: move ca functions to puppetmaster2001
during the upgrade of the puppet master i attempted to move the puppetmaster_ca from puppetmaster1001 to puppetmaster2001. however when i attempted this we saw a whole skew of errors· Looking at the apache logs we see the following stack strace
[ 2019-10-02 15:51:08.8008 17543/7f8a90079700 age/Cor/App/Implementation.cpp:304 ]: Could not spawn process for appl ication /usr/share/puppet/rack/puppet-master: An error occurred while starting up the preloader. Error ID: 8aaa24bb Error details saved to: /tmp/passenger-error-d7nHyI.html Message from application: exit (SystemExit) /usr/lib/ruby/vendor_ruby/puppet/util.rb:554:in `exit' /usr/lib/ruby/vendor_ruby/puppet/util.rb:554:in `rescue in exit_on_fail' /usr/lib/ruby/vendor_ruby/puppet/util.rb:540:in `exit_on_fail' /usr/lib/ruby/vendor_ruby/puppet/application.rb:341:in `run' /usr/lib/ruby/vendor_ruby/puppet/util/command_line.rb:132:in `run' /usr/lib/ruby/vendor_ruby/puppet/util/command_line.rb:72:in `execute' config.ru:111:in `block in <main>' /usr/lib/ruby/vendor_ruby/rack/builder.rb:55:in `instance_eval' /usr/lib/ruby/vendor_ruby/rack/builder.rb:55:in `initialize' config.ru:1:in `new' config.ru:1:in `<main>' /usr/share/passenger/helper-scripts/rack-preloader.rb:110:in `eval' /usr/share/passenger/helper-scripts/rack-preloader.rb:110:in `preload_app' /usr/share/passenger/helper-scripts/rack-preloader.rb:156:in `<module:App>' /usr/share/passenger/helper-scripts/rack-preloader.rb:30:in `<module:PhusionPassenger>' /usr/share/passenger/helper-scripts/rack-preloader.rb:29:in `<main>'
And we see the following error in the puppet agent -t output
Error: /File[/var/lib/puppet/facts.d]: Failed to generate additional resources using 'eval_generate': Error 500 on SERVER: <!DOCTYPE html> <html> <head> <title>We're sorry, but something went wrong (500)</title> <style type="text/css"> body { background-color: #fff; color: #666; text-align: center; font-family: arial, sans-serif; } .dialog { width: 25em; padding: 0 4em; margin: 4em auto 0 auto; border: 1px solid #ccc; border-right-color: #999; border-bottom-color: #999; } h1 { font-size: 100%; color: #f00; line-height: 1.5em; } #operator_info_panel { width: 27em; margin: 4em auto 0 auto; line-height: 1.2em; } #show_operator_info { text-decoration: none; color: #99f; font-size: smaller; } #show_operator_info:hover { text-decoration: underline; } #operator_info { color: #444; text-align: justify; } </style> </head> <body> <div class="dialog"> <h1>We're sorry, but something went wrong.</h1> <p>We've been notified about this issue and we'll take a look at it shortly.</p> </div> <div id="operator_info_panel"> <a id="show_operator_info" href="javascript:void(showOperatorInfo())">Information for the administrator of this website</a> <div id="operator_info" style="display: none"> <p>The Phusion Passenger application server encountered an error while starting your web application. Because you are running this web application in staging or production mode, the details of the error have been omitted from this web page for security reasons.</p> <p><strong>Please read <a href="https://www.phusionpassenger.com/library/admin/log_file/">the Passenger log file</a> to find the details of the error.</strong></p> <p>Alternatively, you can turn on the "friendly error pages" feature (see below), which will make Phusion Passenger show many details about the error right in the browser.</p> <p>To turn on friendly error pages:</p> <ul> <li><a href="https://www.phusionpassenger.com/library/config/nginx/reference/#passenger_friendly_error_pages">Nginx integration mode</a></li> <li><a href="https://www.phusionpassenger.com/library/config/apache/reference/#passengerfriendlyerrorpages">Apache integration mode</a></li> <li><a href="https://www.phusionpassenger.com/library/config/standalone/reference/#--friendly-error-pages---no-friendly-error-pages-friendly_error_pages">Standalone mode</a></li> </ul> </div> </div> <script> function showOperatorInfo() { document.getElementById('operator_info').style.display = 'block'; } </script> </body> </html> Error: /File[/var/lib/puppet/facts.d]: Could not evaluate: Could not retrieve information from environment production source(s) puppet:///pluginfacts
Today i have noticed that the GID and UID for puppet on puppetmaster1001 is different to the uid on puppetmaster2001. This is causing issues for at least GoeIP. Although even when manually fixing the error clients still receive
Error: /Stage[main]/Geoip::Data::Puppet/File[/usr/share/GeoIP]: Failed to generate additional resources using 'eval_generate': Error 500 on SERVER: Server Error: Not authorized to call search on /file_metadata/volatile/GeoIP with {:links=>"manage", :recurse=>true, :source_permissions=>"ignore", :checksum_type=>"md5"} Error: /Stage[main]/Geoip::Data::Puppet/File[/usr/share/GeoIP]: Could not evaluate: Could not retrieve file metadata for puppet:///volatile/GeoIP: Error 500 on SERVER: Server Error: Not authorized to call find on /file_metadata/volatile/GeoIP with {:links=>"manage", :checksum_type=>"md5", :source_permissions=>"ignore"}
Change 541545 had a related patch set uploaded (by Jbond; owner: John Bond):
[operations/puppet@production] puppet::rsync: disable chroot on volatile and ssl rsync
Change 541545 merged by Jbond:
[operations/puppet@production] puppet::rsync: disable chroot on volatile and ssl rsync
Change 541781 had a related patch set uploaded (by Jbond; owner: John Bond):
[operations/puppet@production] puppetmaster: change active puppetmaster ca to codfw
Change 541781 merged by Jbond:
[operations/puppet@production] puppetmaster: change active puppetmaster ca to codfw
Change 541804 had a related patch set uploaded (by Jbond; owner: John Bond):
[operations/dns@master] puppet/config-master: move services from ulsfo, eqsin and esams to eqiad
Change 541807 had a related patch set uploaded (by Jbond; owner: John Bond):
[operations/puppet@production] pybal_config: disable pybal backend in codfw to prepare for reimage
Change 541810 had a related patch set uploaded (by Jbond; owner: John Bond):
[operations/puppet@production] puppetmaster2001: switch boot options to install with buster
Change 541810 merged by Jbond:
[operations/puppet@production] puppetmaster2001: switch boot options to install with buster
Change 541807 merged by Jbond:
[operations/puppet@production] pybal_config: disable pybal backend in codfw to prepare for reimage
Change 541804 merged by Jbond:
[operations/dns@master] puppet/config-master: move services from ulsfo, eqsin and esams to eqiad
Change 541818 had a related patch set uploaded (by Jbond; owner: John Bond):
[operations/dns@master] config-manager/puppet: move codfw entries to eqiad
Change 541818 merged by Jbond:
[operations/dns@master] config-manager/puppet: move codfw entries to eqiad
Script wmf-auto-reimage was launched by jbond on cumin1001.eqiad.wmnet for hosts:
['puppetmaster2001.codfw.wmnet']
The log can be found in /var/log/wmf-auto-reimage/201910091349_jbond_34479.log.
Completed auto-reimage of hosts:
['puppetmaster2001.codfw.wmnet']
Of which those FAILED:
['puppetmaster2001.codfw.wmnet']
Change 541834 had a related patch set uploaded (by Jbond; owner: John Bond):
[operations/dns@master] puppet.codfw.wmnet: move cname back to puppetmaster2001
Change 541834 merged by Jbond:
[operations/dns@master] puppet.codfw.wmnet: move cname back to puppetmaster2001
Change 541838 had a related patch set uploaded (by Jbond; owner: John Bond):
[operations/puppet@production] puppetmaster/pybal_config: move ca and pybal_config to puppetmaster2001
Change 542070 had a related patch set uploaded (by Jbond; owner: John Bond):
[operations/dns@master] puppet.wikimedia.org: move to codfw
Change 542070 merged by Jbond:
[operations/dns@master] puppet.wikimedia.org: move to codfw
Change 542078 had a related patch set uploaded (by Jbond; owner: John Bond):
[operations/dns@master] puppet.eqiad.wmnet: move to codfw
Change 542078 merged by Jbond:
[operations/dns@master] puppet.eqiad.wmnet: move to codfw
Script wmf-auto-reimage was launched by jbond on cumin1001.eqiad.wmnet for hosts:
['puppetmaster1001.eqiad.wmnet']
The log can be found in /var/log/wmf-auto-reimage/201910101333_jbond_37870.log.
Change 542105 had a related patch set uploaded (by Jbond; owner: John Bond):
[operations/puppet@production] puppetmaster1001: updateboot image to buster
Change 542105 merged by Jbond:
[operations/puppet@production] puppetmaster1001: updateboot image to buster
Script wmf-auto-reimage was launched by jbond on cumin1001.eqiad.wmnet for hosts:
['puppetmaster1001.eqiad.wmnet']
The log can be found in /var/log/wmf-auto-reimage/201910101348_jbond_40696.log.
Completed auto-reimage of hosts:
['puppetmaster1001.eqiad.wmnet']
Of which those FAILED:
['puppetmaster1001.eqiad.wmnet']
Script wmf-auto-reimage was launched by jbond on cumin1001.eqiad.wmnet for hosts:
['puppetmaster1001.eqiad.wmnet']
The log can be found in /var/log/wmf-auto-reimage/201910101350_jbond_40938.log.
Completed auto-reimage of hosts:
['puppetmaster1001.eqiad.wmnet']
Of which those FAILED:
['puppetmaster1001.eqiad.wmnet']
Change 542108 had a related patch set uploaded (by Jbond; owner: John Bond):
[operations/puppet@production] puppetmaster: update ca to puppetmaster2002
Change 542108 merged by Jbond:
[operations/puppet@production] puppetmaster: update ca to puppetmaster2002
Script wmf-auto-reimage was launched by jbond on cumin1001.eqiad.wmnet for hosts:
['puppetmaster1001.eqiad.wmnet']
The log can be found in /var/log/wmf-auto-reimage/201910101359_jbond_46457.log.
Completed auto-reimage of hosts:
['puppetmaster1001.eqiad.wmnet']
Of which those FAILED:
['puppetmaster1001.eqiad.wmnet']
Change 542137 had a related patch set uploaded (by Jbond; owner: John Bond):
[operations/dns@master] puppet.eqsin.wmnet: move back to eqiad
Change 542138 had a related patch set uploaded (by Jbond; owner: John Bond):
[operations/dns@master] puppet.eqiad.wmnet: moveback to puppetmaster1001
Change 542137 merged by Jbond:
[operations/dns@master] puppet.eqsin.wmnet: move back to eqiad
Change 542154 had a related patch set uploaded (by Jbond; owner: John Bond):
[operations/dns@master] puppet.wikimedia.org.: move back to puppetmaster1001
Change 542154 merged by Jbond:
[operations/dns@master] puppet.wikimedia.org.: move back to puppetmaster1001
Change 542138 merged by Jbond:
[operations/dns@master] puppet.eqiad.wmnet: moveback to puppetmaster1001
Change 542924 had a related patch set uploaded (by Jbond; owner: John Bond):
[operations/dns@master] config-master: move eqsin and eqiad config master back to eqiad
Change 542926 had a related patch set uploaded (by Jbond; owner: John Bond):
[operations/puppet@production] pybal_backend: enable codfw endpoint
Change 542929 had a related patch set uploaded (by Jbond; owner: John Bond):
[operations/puppet@production] pupetmasters: remove local server config
Change 542924 merged by Jbond:
[operations/dns@master] config-master: move eqsin and eqiad config master back to eqiad
Change 541838 abandoned by Jbond:
puppetmaster/pybal_config: move ca and pybal_config to puppetmaster2001
Change 542926 merged by Jbond:
[operations/puppet@production] pybal_backend: enable codfw endpoint
Change 542929 merged by Jbond:
[operations/puppet@production] pupetmasters: remove local server config
Change 540393 abandoned by Jbond:
pybal_config: remove puppetmaster1001 from pybal_config backend
Change 543148 had a related patch set uploaded (by Jbond; owner: John Bond):
[operations/puppet@production] puppetmaster: offline rhodium
Change 543148 merged by Jbond:
[operations/puppet@production] puppetmaster: offline rhodium