Page MenuHomePhabricator

upgrade puppet master frontends servers
Closed, ResolvedPublic

Description

upgrade puppetmaster1001 and 2001

Details

SubjectRepoBranchLines +/-
operations/puppetproduction+1 -1
operations/puppetproduction+1 -1
operations/puppetproduction+0 -3
operations/puppetproduction+1 -1
operations/puppetproduction+3 -3
operations/dnsmaster+2 -2
operations/dnsmaster+1 -1
operations/dnsmaster+1 -1
operations/dnsmaster+1 -1
operations/puppetproduction+1 -1
operations/puppetproduction+1 -0
operations/dnsmaster+2 -2
operations/dnsmaster+1 -1
operations/dnsmaster+2 -2
operations/dnsmaster+2 -2
operations/dnsmaster+7 -7
operations/puppetproduction+1 -1
operations/puppetproduction+1 -0
operations/puppetproduction+1 -1
operations/puppetproduction+20 -14
operations/puppetproduction+1 -1
operations/puppetproduction+1 -1
operations/dnsmaster+5 -5
operations/dnsmaster+1 -1
operations/dnsmaster+1 -1
operations/dnsmaster+1 -1
operations/dnsmaster+1 -1
operations/dnsmaster+1 -1
operations/dnsmaster+1 -1
operations/dnsmaster+1 -1
Show related patches Customize query in gerrit

Event Timeline

jbond triaged this task as Medium priority.Oct 1 2019, 10:50 AM

Change 540096 had a related patch set uploaded (by Jbond; owner: John Bond):
[operations/dns@master] puppet: point puppet,eqsin.wmnet to puppetmaster1001.eqiad.wmnet

https://gerrit.wikimedia.org/r/540096

Change 540096 merged by Jbond:
[operations/dns@master] puppet: point puppet,eqsin.wmnet to puppetmaster1001.eqiad.wmnet

https://gerrit.wikimedia.org/r/540096

Change 540100 had a related patch set uploaded (by Jbond; owner: John Bond):
[operations/dns@master] puppet: point puppet.ulsfo.wmnet to puppetmaster1001.eqiad.wmnet

https://gerrit.wikimedia.org/r/540100

Change 540100 merged by Jbond:
[operations/dns@master] puppet: point puppet.ulsfo.wmnet to puppetmaster1001.eqiad.wmnet

https://gerrit.wikimedia.org/r/540100

Change 540102 had a related patch set uploaded (by Jbond; owner: John Bond):
[operations/dns@master] puppet: point puppet.codfw.wmnet to puppetmaster1001.eqiad.wmnet

https://gerrit.wikimedia.org/r/540102

Change 540102 merged by Jbond:
[operations/dns@master] puppet: point puppet.codfw.wmnet to puppetmaster1001.eqiad.wmnet

https://gerrit.wikimedia.org/r/540102

Change 540130 had a related patch set uploaded (by Jbond; owner: John Bond):
[operations/dns@master] puppetmaster2001: move eqsin back to puppetmaster2001

https://gerrit.wikimedia.org/r/540130

Change 540133 had a related patch set uploaded (by Jbond; owner: John Bond):
[operations/dns@master] puppetmaster2001: move codfw back to puppetmaster2001

https://gerrit.wikimedia.org/r/540133

Change 540130 merged by Jbond:
[operations/dns@master] puppetmaster2001: move eqsin back to puppetmaster2001

https://gerrit.wikimedia.org/r/540130

Change 540133 merged by Jbond:
[operations/dns@master] puppetmaster2001: move codfw back to puppetmaster2001

https://gerrit.wikimedia.org/r/540133

Change 540367 had a related patch set uploaded (by Jbond; owner: John Bond):
[operations/dns@master] puppetmaster1001: migrate esams puppet traffic to codfw

https://gerrit.wikimedia.org/r/540367

Change 540368 had a related patch set uploaded (by Jbond; owner: John Bond):
[operations/dns@master] puppetmaster1001: move eqaid puppet to codfw

https://gerrit.wikimedia.org/r/540368

Change 540367 merged by Jbond:
[operations/dns@master] puppetmaster1001: migrate esams puppet traffic to codfw

https://gerrit.wikimedia.org/r/540367

Change 540368 merged by Jbond:
[operations/dns@master] puppetmaster1001: move eqaid puppet to codfw

https://gerrit.wikimedia.org/r/540368

Change 540390 had a related patch set uploaded (by Jbond; owner: John Bond):
[operations/dns@master] puppetmaster1001: update config-manager to prepare for reimage

https://gerrit.wikimedia.org/r/540390

Change 540392 had a related patch set uploaded (by Jbond; owner: John Bond):
[operations/puppet@production] puppetmaster1001: move ca to puppetmaster2001 for reimage

https://gerrit.wikimedia.org/r/540392

Change 540393 had a related patch set uploaded (by Jbond; owner: John Bond):
[operations/puppet@production] pybal_config: remove puppetmaster1001 from pybal_config backend

https://gerrit.wikimedia.org/r/540393

Change 540392 merged by Jbond:
[operations/puppet@production] puppetmaster1001: move ca to puppetmaster2001 for reimage

https://gerrit.wikimedia.org/r/540392

Change 540390 merged by Jbond:
[operations/dns@master] puppetmaster1001: update config-master to prepare for reimage

https://gerrit.wikimedia.org/r/540390

Change 540431 had a related patch set uploaded (by Jbond; owner: John Bond):
[operations/puppet@production] puppetmaster_ca: move ca functions to puppetmaster2001

https://gerrit.wikimedia.org/r/540431

Change 540431 merged by Jbond:
[operations/puppet@production] puppetmaster_ca: move ca functions to puppetmaster2001

https://gerrit.wikimedia.org/r/540431

during the upgrade of the puppet master i attempted to move the puppetmaster_ca from puppetmaster1001 to puppetmaster2001. however when i attempted this we saw a whole skew of errors· Looking at the apache logs we see the following stack strace

[ 2019-10-02 15:51:08.8008 17543/7f8a90079700 age/Cor/App/Implementation.cpp:304 ]: Could not spawn process for appl
ication /usr/share/puppet/rack/puppet-master: An error occurred while starting up the preloader.
  Error ID: 8aaa24bb
  Error details saved to: /tmp/passenger-error-d7nHyI.html
  Message from application: exit (SystemExit)
  /usr/lib/ruby/vendor_ruby/puppet/util.rb:554:in `exit'
  /usr/lib/ruby/vendor_ruby/puppet/util.rb:554:in `rescue in exit_on_fail'
  /usr/lib/ruby/vendor_ruby/puppet/util.rb:540:in `exit_on_fail'
  /usr/lib/ruby/vendor_ruby/puppet/application.rb:341:in `run'
  /usr/lib/ruby/vendor_ruby/puppet/util/command_line.rb:132:in `run'
  /usr/lib/ruby/vendor_ruby/puppet/util/command_line.rb:72:in `execute'
  config.ru:111:in `block in <main>'
  /usr/lib/ruby/vendor_ruby/rack/builder.rb:55:in `instance_eval'
  /usr/lib/ruby/vendor_ruby/rack/builder.rb:55:in `initialize'
  config.ru:1:in `new'
  config.ru:1:in `<main>'
  /usr/share/passenger/helper-scripts/rack-preloader.rb:110:in `eval'
  /usr/share/passenger/helper-scripts/rack-preloader.rb:110:in `preload_app'
  /usr/share/passenger/helper-scripts/rack-preloader.rb:156:in `<module:App>'
  /usr/share/passenger/helper-scripts/rack-preloader.rb:30:in `<module:PhusionPassenger>'
  /usr/share/passenger/helper-scripts/rack-preloader.rb:29:in `<main>'

And we see the following error in the puppet agent -t output

Error: /File[/var/lib/puppet/facts.d]: Failed to generate additional resources using 'eval_generate': Error 500 on SERVER: <!DOCTYPE html>
<html>
<head>
  <title>We're sorry, but something went wrong (500)</title>
  <style type="text/css">
    body { background-color: #fff; color: #666; text-align: center; font-family: arial, sans-serif; }
    .dialog {
      width: 25em;
      padding: 0 4em;
      margin: 4em auto 0 auto;
      border: 1px solid #ccc;
      border-right-color: #999;
      border-bottom-color: #999;
    }
    h1 { font-size: 100%; color: #f00; line-height: 1.5em; }
    #operator_info_panel {
      width: 27em;
      margin: 4em auto 0 auto;
      line-height: 1.2em;
    }
    #show_operator_info { text-decoration: none; color: #99f; font-size: smaller; }
    #show_operator_info:hover { text-decoration: underline; }
    #operator_info { color: #444; text-align: justify; }
  </style>
</head>

<body>
  <div class="dialog">
    <h1>We're sorry, but something went wrong.</h1>
    <p>We've been notified about this issue and we'll take a look at it shortly.</p>
  </div>
  <div id="operator_info_panel">
    <a id="show_operator_info" href="javascript:void(showOperatorInfo())">Information for the administrator of this website</a>
    <div id="operator_info" style="display: none">
      <p>The Phusion Passenger application server encountered an error while starting your web application.
        Because you are running this web application in staging or production mode, the details of the error
        have been omitted from this web page for security reasons.</p>
      <p><strong>Please read <a href="https://www.phusionpassenger.com/library/admin/log_file/">the Passenger log file</a> to find the details of the error.</strong></p>
      <p>Alternatively, you can turn on the "friendly error pages" feature (see below), which will make Phusion Passenger show many details about the error right in the browser.</p>
      <p>To turn on friendly error pages:</p>
      <ul>
        <li><a href="https://www.phusionpassenger.com/library/config/nginx/reference/#passenger_friendly_error_pages">Nginx integration mode</a></li>
        <li><a href="https://www.phusionpassenger.com/library/config/apache/reference/#passengerfriendlyerrorpages">Apache integration mode</a></li>
        <li><a href="https://www.phusionpassenger.com/library/config/standalone/reference/#--friendly-error-pages---no-friendly-error-pages-friendly_error_pages">Standalone mode</a></li>
      </ul>
    </div>
  </div>

  <script>
    function showOperatorInfo() {
      document.getElementById('operator_info').style.display = 'block';
    }
  </script>
</body>
</html>

Error: /File[/var/lib/puppet/facts.d]: Could not evaluate: Could not retrieve information from environment production source(s) puppet:///pluginfacts

Today i have noticed that the GID and UID for puppet on puppetmaster1001 is different to the uid on puppetmaster2001. This is causing issues for at least GoeIP. Although even when manually fixing the error clients still receive

Error: /Stage[main]/Geoip::Data::Puppet/File[/usr/share/GeoIP]: Failed to generate additional resources using 'eval_generate': Error 500 on SERVER: Server Error: Not authorized to call search on /file_metadata/volatile/GeoIP with {:links=>"manage", :recurse=>true, :source_permissions=>"ignore", :checksum_type=>"md5"}
Error: /Stage[main]/Geoip::Data::Puppet/File[/usr/share/GeoIP]: Could not evaluate: Could not retrieve file metadata for puppet:///volatile/GeoIP: Error 500 on SERVER: Server Error: Not authorized to call find on /file_metadata/volatile/GeoIP with {:links=>"manage", :checksum_type=>"md5", :source_permissions=>"ignore"}

Change 541545 had a related patch set uploaded (by Jbond; owner: John Bond):
[operations/puppet@production] puppet::rsync: disable chroot on volatile and ssl rsync

https://gerrit.wikimedia.org/r/541545

Change 541545 merged by Jbond:
[operations/puppet@production] puppet::rsync: disable chroot on volatile and ssl rsync

https://gerrit.wikimedia.org/r/541545

Change 541781 had a related patch set uploaded (by Jbond; owner: John Bond):
[operations/puppet@production] puppetmaster: change active puppetmaster ca to codfw

https://gerrit.wikimedia.org/r/541781

Change 541781 merged by Jbond:
[operations/puppet@production] puppetmaster: change active puppetmaster ca to codfw

https://gerrit.wikimedia.org/r/541781

Change 541804 had a related patch set uploaded (by Jbond; owner: John Bond):
[operations/dns@master] puppet/config-master: move services from ulsfo, eqsin and esams to eqiad

https://gerrit.wikimedia.org/r/541804

Change 541807 had a related patch set uploaded (by Jbond; owner: John Bond):
[operations/puppet@production] pybal_config: disable pybal backend in codfw to prepare for reimage

https://gerrit.wikimedia.org/r/541807

Change 541810 had a related patch set uploaded (by Jbond; owner: John Bond):
[operations/puppet@production] puppetmaster2001: switch boot options to install with buster

https://gerrit.wikimedia.org/r/541810

Change 541810 merged by Jbond:
[operations/puppet@production] puppetmaster2001: switch boot options to install with buster

https://gerrit.wikimedia.org/r/541810

Change 541807 merged by Jbond:
[operations/puppet@production] pybal_config: disable pybal backend in codfw to prepare for reimage

https://gerrit.wikimedia.org/r/541807

Change 541804 merged by Jbond:
[operations/dns@master] puppet/config-master: move services from ulsfo, eqsin and esams to eqiad

https://gerrit.wikimedia.org/r/541804

Change 541818 had a related patch set uploaded (by Jbond; owner: John Bond):
[operations/dns@master] config-manager/puppet: move codfw entries to eqiad

https://gerrit.wikimedia.org/r/541818

Change 541818 merged by Jbond:
[operations/dns@master] config-manager/puppet: move codfw entries to eqiad

https://gerrit.wikimedia.org/r/541818

Script wmf-auto-reimage was launched by jbond on cumin1001.eqiad.wmnet for hosts:

['puppetmaster2001.codfw.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201910091349_jbond_34479.log.

Completed auto-reimage of hosts:

['puppetmaster2001.codfw.wmnet']

Of which those FAILED:

['puppetmaster2001.codfw.wmnet']

Change 541834 had a related patch set uploaded (by Jbond; owner: John Bond):
[operations/dns@master] puppet.codfw.wmnet: move cname back to puppetmaster2001

https://gerrit.wikimedia.org/r/541834

Change 541834 merged by Jbond:
[operations/dns@master] puppet.codfw.wmnet: move cname back to puppetmaster2001

https://gerrit.wikimedia.org/r/541834

Change 541838 had a related patch set uploaded (by Jbond; owner: John Bond):
[operations/puppet@production] puppetmaster/pybal_config: move ca and pybal_config to puppetmaster2001

https://gerrit.wikimedia.org/r/541838

Change 542070 had a related patch set uploaded (by Jbond; owner: John Bond):
[operations/dns@master] puppet.wikimedia.org: move to codfw

https://gerrit.wikimedia.org/r/542070

Change 542070 merged by Jbond:
[operations/dns@master] puppet.wikimedia.org: move to codfw

https://gerrit.wikimedia.org/r/542070

Change 542078 had a related patch set uploaded (by Jbond; owner: John Bond):
[operations/dns@master] puppet.eqiad.wmnet: move to codfw

https://gerrit.wikimedia.org/r/542078

Change 542078 merged by Jbond:
[operations/dns@master] puppet.eqiad.wmnet: move to codfw

https://gerrit.wikimedia.org/r/542078

Script wmf-auto-reimage was launched by jbond on cumin1001.eqiad.wmnet for hosts:

['puppetmaster1001.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201910101333_jbond_37870.log.

Change 542105 had a related patch set uploaded (by Jbond; owner: John Bond):
[operations/puppet@production] puppetmaster1001: updateboot image to buster

https://gerrit.wikimedia.org/r/542105

Change 542105 merged by Jbond:
[operations/puppet@production] puppetmaster1001: updateboot image to buster

https://gerrit.wikimedia.org/r/542105

Script wmf-auto-reimage was launched by jbond on cumin1001.eqiad.wmnet for hosts:

['puppetmaster1001.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201910101348_jbond_40696.log.

Completed auto-reimage of hosts:

['puppetmaster1001.eqiad.wmnet']

Of which those FAILED:

['puppetmaster1001.eqiad.wmnet']

Script wmf-auto-reimage was launched by jbond on cumin1001.eqiad.wmnet for hosts:

['puppetmaster1001.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201910101350_jbond_40938.log.

Completed auto-reimage of hosts:

['puppetmaster1001.eqiad.wmnet']

Of which those FAILED:

['puppetmaster1001.eqiad.wmnet']

Change 542108 had a related patch set uploaded (by Jbond; owner: John Bond):
[operations/puppet@production] puppetmaster: update ca to puppetmaster2002

https://gerrit.wikimedia.org/r/542108

Change 542108 merged by Jbond:
[operations/puppet@production] puppetmaster: update ca to puppetmaster2002

https://gerrit.wikimedia.org/r/542108

Script wmf-auto-reimage was launched by jbond on cumin1001.eqiad.wmnet for hosts:

['puppetmaster1001.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201910101359_jbond_46457.log.

Completed auto-reimage of hosts:

['puppetmaster1001.eqiad.wmnet']

Of which those FAILED:

['puppetmaster1001.eqiad.wmnet']

Change 542137 had a related patch set uploaded (by Jbond; owner: John Bond):
[operations/dns@master] puppet.eqsin.wmnet: move back to eqiad

https://gerrit.wikimedia.org/r/542137

Change 542138 had a related patch set uploaded (by Jbond; owner: John Bond):
[operations/dns@master] puppet.eqiad.wmnet: moveback to puppetmaster1001

https://gerrit.wikimedia.org/r/542138

Change 542137 merged by Jbond:
[operations/dns@master] puppet.eqsin.wmnet: move back to eqiad

https://gerrit.wikimedia.org/r/542137

Change 542154 had a related patch set uploaded (by Jbond; owner: John Bond):
[operations/dns@master] puppet.wikimedia.org.: move back to puppetmaster1001

https://gerrit.wikimedia.org/r/542154

Change 542154 merged by Jbond:
[operations/dns@master] puppet.wikimedia.org.: move back to puppetmaster1001

https://gerrit.wikimedia.org/r/542154

Change 542138 merged by Jbond:
[operations/dns@master] puppet.eqiad.wmnet: moveback to puppetmaster1001

https://gerrit.wikimedia.org/r/542138

Change 542924 had a related patch set uploaded (by Jbond; owner: John Bond):
[operations/dns@master] config-master: move eqsin and eqiad config master back to eqiad

https://gerrit.wikimedia.org/r/542924

Change 542926 had a related patch set uploaded (by Jbond; owner: John Bond):
[operations/puppet@production] pybal_backend: enable codfw endpoint

https://gerrit.wikimedia.org/r/542926

Change 542929 had a related patch set uploaded (by Jbond; owner: John Bond):
[operations/puppet@production] pupetmasters: remove local server config

https://gerrit.wikimedia.org/r/542929

Change 542924 merged by Jbond:
[operations/dns@master] config-master: move eqsin and eqiad config master back to eqiad

https://gerrit.wikimedia.org/r/542924

Change 541838 abandoned by Jbond:
puppetmaster/pybal_config: move ca and pybal_config to puppetmaster2001

https://gerrit.wikimedia.org/r/541838

Change 542926 merged by Jbond:
[operations/puppet@production] pybal_backend: enable codfw endpoint

https://gerrit.wikimedia.org/r/542926

Change 542929 merged by Jbond:
[operations/puppet@production] pupetmasters: remove local server config

https://gerrit.wikimedia.org/r/542929

Change 540393 abandoned by Jbond:
pybal_config: remove puppetmaster1001 from pybal_config backend

https://gerrit.wikimedia.org/r/540393

Change 543148 had a related patch set uploaded (by Jbond; owner: John Bond):
[operations/puppet@production] puppetmaster: offline rhodium

https://gerrit.wikimedia.org/r/543148

Change 543148 merged by Jbond:
[operations/puppet@production] puppetmaster: offline rhodium

https://gerrit.wikimedia.org/r/543148