Spotted this while investigating something else, it looks like at least acme-chief hosts are not in Prometheus. The bigger issue being that now it is possible to silently set $cluster to an invalid (i.e. not in wikimedia_cluster) value
Description
Details
Event Timeline
Change 539927 had a related patch set uploaded (by Filippo Giunchedi; owner: Filippo Giunchedi):
[operations/puppet@production] hieradata: add acmechief cluster
Change 539927 merged by Filippo Giunchedi:
[operations/puppet@production] hieradata: add acmechief cluster
Change 539934 had a related patch set uploaded (by Filippo Giunchedi; owner: Filippo Giunchedi):
[operations/puppet@production] WIP profile: sanity checks for cluster
Change 539934 merged by Filippo Giunchedi:
[operations/puppet@production] profile: sanity checks for cluster
Change 544155 had a related patch set uploaded (by Filippo Giunchedi; owner: Filippo Giunchedi):
[operations/puppet@production] hieradata: fix cluster inconsistencies
Change 544155 merged by Filippo Giunchedi:
[operations/puppet@production] hieradata: fix cluster inconsistencies
In CloudVPS every VM I could check have this puppet agent error now:
aborrero@cloud-cumin-01:~$ sudo puppet agent -tv Info: Using configured environment 'production' Info: Retrieving pluginfacts Info: Retrieving plugin Info: Loading facts Error: Could not retrieve catalog from remote server: Error 500 on SERVER: Server Error: Evaluation Error: Error while evaluating a Function Call, Cluster misc not defined in wikimedia_clusters at /etc/puppet/modules/profile/manifests/base.pp:49:9 on node cloud-cumin-01.cloudinfra.eqiad.wmflabs Warning: Not using cache on failed catalog Error: Could not retrieve catalog; skipping run
Looks like this is causing a basically vanilla VM on cloudvps to fail to run puppet (and therefore properly initialise ssh keys etc..)
wikidata-icinga.wikidata-dev.eqiad.wmflabs
rc.local[410]: [1;31mError: Could not retrieve catalog from remote server: Error 500 on SERVER: Server Error: Evaluation Error: Error while evaluating a Function Call, Cluster misc not defined in wikimedia_clusters at /etc/puppet/modules/profile/manifests/base.pp:49:9 on node wikidata-icinga.wikidata-dev.eqiad.wmflabs[0m
Looks like it's common across cloudVPS
@aborrero on IRC:
<arturo> tarrow: this seems to be a wide-spread issue
Change 544166 had a related patch set uploaded (by Filippo Giunchedi; owner: Filippo Giunchedi):
[operations/puppet@production] hieradata: fix wikimedia_clusters for wmcs
Change 544166 merged by Filippo Giunchedi:
[operations/puppet@production] hieradata: fix wikimedia_clusters for wmcs