Page MenuHomePhabricator

Puppet failures on deployment-deploy01.deployment-prep.eqiad.wmflabs
Closed, ResolvedPublic

Description

$ sudo -i puppet agent -tv
Info: Using configured environment 'production'
Info: Retrieving pluginfacts
Info: Retrieving plugin
Info: Loading facts
Error: Could not retrieve catalog from remote server: Error 500 on SERVER: Server Error: Evaluation Error: Error while evaluating a Resource Statement, Evaluation Error: Error while evaluating a Function Call, Could not find data item profile::services_proxy::services in any Hiera data file and no default supplied at /etc/puppet/modules/profile/manifests/services_proxy.pp:20:19 on node deployment-deploy01.deployment-prep.eqiad.wmflabs
Warning: Not using cache on failed catalog
Error: Could not retrieve catalog; skipping run

Event Timeline

Also the case on deployment-deploy02.deployment-prep.eqiad.wmflabs and probably other Beta Cluster servers?

From the prod hieradata/common/profile/services_proxy.yaml:

# This list is mostly to be used inside mediawiki.
profile::services_proxy::services:
  search-chi_eqiad:
    port: 9243
    localport: 19243
    scheme: https
    hostname: search.svc.eqiad.wmnet
    timeout: 600
  search-psi_eqiad:
    port: 9643
    localport: 19643
    scheme: https
    hostname: search.svc.eqiad.wmnet
    timeout: 600
  search-omega_eqiad:
    port: 9443
    localport: 19443
    scheme: https
    hostname: search.svc.eqiad.wmnet
    timeout: 600
  search-chi_codfw:
    port: 9243
    localport: 14243
    scheme: https
    hostname: search.svc.codfw.wmnet
    timeout: 600
  search-psi_codfw:
    port: 9643
    localport: 14643
    scheme: https
    hostname: search.svc.codfw.wmnet
    timeout: 600
  search-omega_codfw:
    port: 9443
    localport: 14443
    scheme: https
    hostname: search.svc.codfw.wmnet
    timeout: 600

I made a quick and ugly hack to get Puppet working here again by adding this config to https://wikitech.wikimedia.org/wiki/Hiera:Deployment-prep:

"profile::services_proxy::services":
    dummy:
        port: 9999
        localport: 9999
        scheme: https
        hostname: dummy.example.net
        timeout: 600
"profile::services_proxy::ensure": absent

Someone who knows what bits of this should actually be running in the beta cluster and how to configure that should follow up. Pinging @Joe as the author of the profile that needed this hack.

I did keep beta in mind when writing this, as it is apparent from labs/deployment-prep/common.yaml containing a hiera key to ensure we won't use or install the services proxy there.

In a later refactor, though, I made profile::services_proxy::services non-optional. Sorry about that. Fixing it now.

Change 490805 had a related patch set uploaded (by Giuseppe Lavagetto; owner: Giuseppe Lavagetto):
[operations/puppet@production] profile::services_proxy: make declaring services optional

https://gerrit.wikimedia.org/r/490805

Change 490805 merged by Giuseppe Lavagetto:
[operations/puppet@production] profile::services_proxy: make declaring services optional

https://gerrit.wikimedia.org/r/490805

@Joe that patch is apparently not quite sufficient. New output with the dummy settings from T216164#4955058 removed:

Error: Could not retrieve catalog from remote server: Error 500 on SERVER: Server Error: Evaluation Error: Error while evaluating a Resource Statement, Evaluation Error: Error while evaluating a Function Call, Failed to parse template profile/services_proxy/upstream_proxies.conf.erb:
  Filepath: /etc/puppet/modules/profile/templates/services_proxy/upstream_proxies.conf.erb
  Line: 1
  Detail: undefined method `each' for nil:NilClass
 at /etc/puppet/modules/profile/manifests/services_proxy.pp:37:20 on node deployment-deploy01.deployment-prep.eqiad.wmflabs

I have put the dummy settings back in https://wikitech.wikimedia.org/wiki/Hiera:Deployment-prep for now to keep puppet working. The puppet run at the moment also has apt failures but I think they were caused by something else.

Actually it might be related after all. The failure is from an attempt to install the niginx-full package which seems to be triggered by profile::services_proxy's nginx::site { 'upstream_proxies': .... The profile as configured now tries to ensure => absent that define, but the define itself contains include ::nginx which does not get passed the ensure => absent parameter and thus does try to install the nginx-full package and related configuration.

Change 504781 had a related patch set uploaded (by Alex Monk; owner: Alex Monk):
[operations/puppet/nginx@master] site: Only include ::nginx if ensure present

https://gerrit.wikimedia.org/r/504781

Reedy added a subscriber: Reedy.
root@deployment-deploy01:/home/reedy# sudo -i puppet agent -tv
Info: Using configured environment 'production'
Info: Retrieving pluginfacts
Info: Retrieving plugin
Info: Retrieving locales
Info: Loading facts
Info: Caching catalog for deployment-deploy01.deployment-prep.eqiad.wmflabs
Info: Applying configuration version '(079299c409) root - redis::multidc: Make discovery optional'
Notice: The LDAP client stack for this host is: classic/sudoldap
Notice: /Stage[main]/Profile::Ldap::Client::Labs/Notify[LDAP client stack]/message: defined 'message' as 'The LDAP client stack for this host is: classic/sudoldap'
Notice: Applied catalog in 15.65 seconds
root@deployment-deploy01:/home/reedy#