Page MenuHomePhabricator

deployment-restbase02 puppet broken
Closed, ResolvedPublic

Description

Info: Loading facts
Error: Could not retrieve catalog from remote server: Error 500 on SERVER: Server Error: Evaluation Error: Error while evaluating a Resource Statement, Function lookup() did not find a value for the name 'profile::envoy::ensure' (file: /etc/puppet/modules/profile/manifests/envoy.pp, line: 5) on node deployment-restbase02.deployment-prep.eqiad.wmflabs
Warning: Not using cache on failed catalog
Error: Could not retrieve catalog; skipping run
cscott@deployment-restbase02:/etc/restbase$

Event Timeline

cscott created this task.Mar 11 2020, 9:43 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMar 11 2020, 9:43 PM

Related to T247147: Puppet fails on Beta Cluster because "did not find a value for the name 'profile::envoy::ensure'". @Jdforrester-WMF says, "cscott: Oh, yes, I only fixed that for parsoid11. Someone needs to tell me how to fix it for the whole of beta cluster."

All of Beta Cluster has that error.

Change 579070 had a related patch set uploaded (by C. Scott Ananian; owner: C. Scott Ananian):
[operations/puppet@production] Ensure puppet works on beta cluster by allowing envoy to be absent

https://gerrit.wikimedia.org/r/579070

Change 579070 merged by Dzahn:
[operations/puppet@production] Ensure puppet works on beta cluster by allowing envoy to be absent

https://gerrit.wikimedia.org/r/579070

https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/579070 merged, which helped, but now:

$ sudo puppet agent -tv
Warning: Unable to fetch my node definition, but the agent run will continue:
Warning: Error 500 on SERVER: Server Error: Could not retrieve facts for deployment-restbase02.deployment-prep.eqiad.wmflabs: Failed to find facts from PuppetDB at puppet:8140: Failed to execute '/pdb/query/v4/nodes/deployment-restbase02.deployment-prep.eqiad.wmflabs/facts' on at least 1 of the following 'server_urls': https://deployment-puppetdb03.deployment-prep.eqiad.wmflabs
Info: Retrieving pluginfacts
Info: Retrieving plugin
Info: Loading facts
Error: Could not retrieve catalog from remote server: Error 500 on SERVER: Server Error: Failed to execute '/pdb/cmd/v1?checksum=19e0362414c0c1493db8e04908d36fee35aff424&version=5&certname=deployment-restbase02.deployment-prep.eqiad.wmflabs&command=replace_facts&producer-timestamp=2020-03-11T22:26:26.389Z' on at least 1 of the following 'server_urls': https://deployment-puppetdb03.deployment-prep.eqiad.wmflabs
Warning: Not using cache on failed catalog
Error: Could not retrieve catalog; skipping run

so...

<volans> and WMCS doesn't have a centralized puppetdb
<volans> deployment-prep might have it I don't recall
<Krenair> we do
<Krenair> though looks like maybe something's up with it
<Krenair> yeah  it got OOM killed again
<cscott> anyway, puppet is still b0rked on deployment-restbase02, because (it seems) something's wrong with  https://deployment-puppetdb03.deployment-prep.eqiad.wmflabs
<Krenair> yes
<Krenair> that's what I said
<Krenair> wondering if I can  make that puppetdb instance larger

I'm going to restart puppetdb as it got OOM killed again, want to look at doing something like T247206 to just make the instance bigger. I guess newer puppetdb uses more RAM than before.

This was fixed, I think? Can it be closed? Or merged with T247206?

Krenair closed this task as Resolved.Mar 23 2020, 9:40 PM
Krenair assigned this task to cscott.

Let's avoid conflating this with T248041 / T247206