Forked from T393855, deployment-etcd05.deployment-prep.eqiad1.wikimedia.cloud fails to run Puppet since
The last Puppet run was at Wed Apr 16 12:32:36 UTC 2025 (37120 minutes ago).
Because it fails with:
May 12 07:02:42 deployment-etcd05 puppet-agent[165659]: Could not retrieve catalog from remote server: Error 500 on SERVER: Server Error: Evaluation Error: Error while evaluating a Resource Statement, Function lookup() did not find a value for the name 'prometheus::instances_defaults' on node deployment-etcd05.deployment-prep.eqiad1.wikimedia.cloud
Even though hieradata/common/prometheus.yaml has:
prometheus::instances_defaults: retention_time: 4032h retention_size: ~ thanos_upload: true k8s_cluster_name: ~ hosts: ~ provision_lv_size: '50g'
I can not figure out what is wrong in Puppet . From the Puppet server there was a single change applied after the last working run:
$ git cherry -v snapshot-202504161216 snapshot-202504161258|egrep '^\+' + 09b591c8145b013acd71f00e1fca7bb8e982c53e etcd: replace prometheus_all_nodes
That is https://gerrit.wikimedia.org/r/c/operations/puppet/+/1129177 etcd: replace prometheus_all_nodes. That at least matches etcd and prometheus but I fail to find how that broke Puppet or how that patch is related to a missing prometheus::instances_defaults.