Right now there are 71 VMs that I can't reach via cloud cumin:
a11y.reading-web-staging.eqiad1.wikimedia.cloud,backend.wikicommunityhealth.eqiad1.wikimedia.cloud,canary[1027,1036]-01.cloudvirt-canary.eqiad1.wikimedia.cloud,canary-wdqs1003-01.cloudvirt-canary.eqiad1.wikimedia.cloud,client-[05,09].swift.eqiad1.wikimedia.cloud,client-a.monitoring.eqiad1.wikimedia.cloud,commonsarchive-mwtest.commonsarchive.eqiad1.wikimedia.cloud,cumin.mariadb104-test.eqiad1.wikimedia.cloud,fullstackd-20210723161838.admin-monitoring.eqiad1.wikimedia.cloud,gerrit-prod-1001.devtools.eqiad1.wikimedia.cloud,lb-01.swift.eqiad1.wikimedia.cloud,locality.trove.eqiad1.wikimedia.cloud,logging-cassandra-01.logging.eqiad1.wikimedia.cloud,logging-elastic7-[02-03].logging.eqiad1.wikimedia.cloud,logging-grafana-01.logging.eqiad1.wikimedia.cloud,logging-logstash7-01.logging.eqiad1.wikimedia.cloud,logging-loki-[01-02].logging.eqiad1.wikimedia.cloud,logging-lts-01.logging.eqiad1.wikimedia.cloud,logging-puppet-05.logging.eqiad1.wikimedia.cloud,logging-puppetdb-03.logging.eqiad1.wikimedia.cloud,logging-sts-01.logging.eqiad1.wikimedia.cloud,maria1.trove.eqiad1.wikimedia.cloud,mariadb104-test1.mariadb104-test.eqiad1.wikimedia.cloud,metricsinfra-db-1.trove.eqiad1.wikimedia.cloud,ms-be-[01-02].swift.eqiad1.wikimedia.cloud,ms-fe-[01-03].swift.eqiad1.wikimedia.cloud,mwv-builder-03.mediawiki-vagrant.eqiad1.wikimedia.cloud,nehpets.reading-web-staging.eqiad1.wikimedia.cloud,pawsdb-1.trove.eqiad1.wikimedia.cloud,pki-01.swift.eqiad1.wikimedia.cloud,pontoon-acmechief-01.monitoring.eqiad1.wikimedia.cloud,pontoon-cumin-01.monitoring.eqiad1.wikimedia.cloud,pontoon-elastic7-02.monitoring.eqiad1.wikimedia.cloud,pontoon-frontend-02.monitoring.eqiad1.wikimedia.cloud,pontoon-grafana-01.monitoring.eqiad1.wikimedia.cloud,pontoon-graphite-03.monitoring.eqiad1.wikimedia.cloud,pontoon-icinga-01.monitoring.eqiad1.wikimedia.cloud,pontoon-kafka-01.monitoring.eqiad1.wikimedia.cloud,pontoon-kafkamon-01.monitoring.eqiad1.wikimedia.cloud,pontoon-log-[01-02].monitoring.eqiad1.wikimedia.cloud,pontoon-logstash7-03.monitoring.eqiad1.wikimedia.cloud,pontoon-ms-be-[01-02].monitoring.eqiad1.wikimedia.cloud,pontoon-mwlog-01.monitoring.eqiad1.wikimedia.cloud,pontoon-netmon-01.monitoring.eqiad1.wikimedia.cloud,pontoon-prometheus-01.monitoring.eqiad1.wikimedia.cloud,pontoon-puppet-[01,05].monitoring.eqiad1.wikimedia.cloud,pontoon-puppetdb-01.monitoring.eqiad1.wikimedia.cloud,pontoon-thanos-[01-02].monitoring.eqiad1.wikimedia.cloud,puppet-[01,03].swift.eqiad1.wikimedia.cloud,puppetdb.mariadb104-test.eqiad1.wikimedia.cloud,puppetmaster.mariadb104-test.eqiad1.wikimedia.cloud,relforge-search.search.eqiad1.wikimedia.cloud,server-[02-03].swift.eqiad1.wikimedia.cloud,server-a.monitoring.eqiad1.wikimedia.cloud,slave[1-2].mariadb104-test.eqiad1.wikimedia.cloud,zarcillo[0-1].mariadb104-test.eqiad1.wikimedia.cloud
Many of those hosts seem to have puppet disabled or broken because they're managed by pontoon. I expected this to be true exclusively in the 'pontoon' project but it seems also to be true at least in 'swift', and possibly elsewhere.
At the moment, it is a requirement of Cloud VPS usage that VMs be puppetized and accessible with Cumin. A good example of why turned up today: the libera people were on the verge of blocking cloud access to their servers due to misbehaving IRC clients; forensics were difficult on the cuminless hosts and ultimately one of the culprits turned out to be a host in the pontoon project. See T287265 for details.
My preference is to cut SRE staff some slack regarding the puppet/cumin requirements, but these gaps in our access and observability are an increasing issue. If pontoon work continues on Cloud VPS, please prioritize restoring standard puppet configs, or at the very least no breaking cumin access on these VMs.
Thanks!