There are two parts to address here:
- (collaboration-services) Renewing the certificate
- (SRE Observability) The fact that it's an expiry warning with a negative timer (-18 days at the moment of creating the task) instead of an expiry alert
There are two parts to address here:
In theory this should have become critical at 1 week remaining - is the critical alert defined properly?
It's not clear whether we actually need the separate puppetmaster, looking into that.
None of the current project members outside of Collab have strong opinions on the project configuration so we're OK to make changes as we see fit.
Let's just delete that local puppetmaster and configure the instance to use the global default puppetmaster.
Worst case this means some local hack (that nobody is aware of) disappears.
If it turns out a local puppetmaster is needed.. then recreating it is probably also the easier fix for the expired CA problem.
Good news:
There is only one instance in that project and I went and checked which puppetmaster it is configured to use. And the answer is "not this one with the cert issue". It uses the global cloud puppetmaster.
So there is no change on that instance and nothing uses the puppetmaster that this ticket is about.
root@mailman1:/# hostname -f mailman1.mailman.eqiad1.wikimedia.cloud root@mailman1:/# grep server /etc/puppet/puppet.conf server = puppetmaster.cloudinfra.wmflabs.org ca_server = puppetmaster.cloudinfra.wmflabs.org
Here is the history of that puppetmaster instance:
https://horizon.wikimedia.org/project/instances/a8b2ff57-9841-434b-b233-d8c0064ddb1b/
It was created in March 2024 by @Andrew but I would bet a lot this was either just migrating it from something that existed before that or by request from someone else (who did not end up actually using it).
Mentioned in SAL (#wikimedia-cloud) [2025-09-26T19:16:32Z] <mutante> shutting down instance mailman-puppetserver-1 - T402889
I shut mailman-puppetserver-1.mailman down. Not deleting it just yet. So if I am wrong it can be started again.
Confirmed this had no effect on instance mailman1.mailman where puppet still works fine.
One mouse click and this instance is deleted. The renewal part is moot now.
This just leaves the part for observability in this ticket.
Mentioned in SAL (#wikimedia-cloud) [2025-10-01T19:44:13Z] <mutante> deleting instance mailman-puppetserver-1 with expired CA and no users - T402889
The certificate renewal was solved by @Dzahn, I'll follow up on the alerting component separately.