Page MenuHomePhabricator

deployment-prep: instances with puppet issues
Closed, ResolvedPublic

Description

Hi,

there are a couple of instances in the deployment-prep project with puppet issues:

  • "deployment-redis05.deployment-prep.eqiad.wmflabs": "CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues",
  • "deployment-redis06.deployment-prep.eqiad.wmflabs": "CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues",

Please address that.

Assigning task to @fgiunchedi since he seems to be the creator of these instances.

Event Timeline

fgiunchedi added subscribers: elukey, Joe.

Indeed I created these instances when helping out with redis stretch migration, though I'm not familiar with the current situation and the errors below or if these instances can be deleted, resized, etc. Perhaps @Joe or @elukey have input?

Both instances have puppet failing since Oct 14th ~15 UTC

Puppet errors:

filippo@deployment-redis05:~$ sudo puppet agent --test
Info: Using configured environment 'production'
Info: Retrieving pluginfacts
Info: Retrieving plugin
Info: Loading facts
Error: Could not retrieve catalog from remote server: Error 500 on SERVER: Server Error: Evaluation Error: Error while evaluating a Function Call, Reading data from httpyaml/deployment-prep/node/deployment-redis05.deployment-prep.eqiad.wmflabs failed: Psych::BadAlias: Unknown alias: id001 at /etc/puppet/manifests/realm.pp:63:15 on node deployment-redis05.deployment-prep.eqiad.wmflabs
Warning: Not using cache on failed catalog
Error: Could not retrieve catalog; skipping run
filippo@deployment-redis06:~$ sudo puppet agent --test
Info: Using configured environment 'production'
Info: Retrieving pluginfacts
Info: Retrieving plugin
Info: Loading facts
Error: Could not retrieve catalog from remote server: Error 500 on SERVER: Server Error: Evaluation Error: Error while evaluating a Function Call, Reading data from httpyaml/deployment-prep/node/deployment-redis06.deployment-prep.eqiad.wmflabs failed: Psych::BadAlias: Unknown alias: id001 at /etc/puppet/manifests/realm.pp:63:15 on node deployment-redis06.deployment-prep.eqiad.wmflabs
Warning: Not using cache on failed catalog
Error: Could not retrieve catalog; skipping run
Krenair claimed this task.
Krenair subscribed.

Removed instance hiera data in horizon, it was the source of these puppet failures and also contained IPs for old nonexistent deployment-redis0[12] hosts

Mentioned in SAL (#wikimedia-cloud) [2018-10-31T21:16:52Z] <Krenair> remove horizon hiera config for deployment-redis0[56] to unbreak puppet and remove old redis0[12] instance IPs T208040