Page MenuHomePhabricator

Puppet CA certificate Puppet CA: mailman-puppetmaster.mailman.eqiad.wmflabs expired
Closed, ResolvedPublic

Description

https://alerts.wikimedia.org/?q=alertname%3DPuppetCertificateAboutToExpire&q=project%3Dmailman&q=%40receiver%3Dblackhole

There are two parts to address here:

  • (collaboration-services) Renewing the certificate
  • (SRE Observability) The fact that it's an expiry warning with a negative timer (-18 days at the moment of creating the task) instead of an expiry alert

Event Timeline

It's not clear whether we actually need the separate puppetmaster, looking into that.

LSobanski triaged this task as Low priority.
LSobanski moved this task from Incoming to Work in Progress on the collaboration-services board.

None of the current project members outside of Collab have strong opinions on the project configuration so we're OK to make changes as we see fit.

LSobanski moved this task from Radar to Inbox on the SRE Observability board.

Let's just delete that local puppetmaster and configure the instance to use the global default puppetmaster.

Worst case this means some local hack (that nobody is aware of) disappears.

If it turns out a local puppetmaster is needed.. then recreating it is probably also the easier fix for the expired CA problem.

Good news:

There is only one instance in that project and I went and checked which puppetmaster it is configured to use. And the answer is "not this one with the cert issue". It uses the global cloud puppetmaster.

So there is no change on that instance and nothing uses the puppetmaster that this ticket is about.

root@mailman1:/# hostname -f
mailman1.mailman.eqiad1.wikimedia.cloud

root@mailman1:/# grep server /etc/puppet/puppet.conf
server = puppetmaster.cloudinfra.wmflabs.org
ca_server = puppetmaster.cloudinfra.wmflabs.org

Here is the history of that puppetmaster instance:

https://horizon.wikimedia.org/project/instances/a8b2ff57-9841-434b-b233-d8c0064ddb1b/

It was created in March 2024 by @Andrew but I would bet a lot this was either just migrating it from something that existed before that or by request from someone else (who did not end up actually using it).

Mentioned in SAL (#wikimedia-cloud) [2025-09-26T19:16:32Z] <mutante> shutting down instance mailman-puppetserver-1 - T402889

I shut mailman-puppetserver-1.mailman down. Not deleting it just yet. So if I am wrong it can be started again.

Confirmed this had no effect on instance mailman1.mailman where puppet still works fine.

One mouse click and this instance is deleted. The renewal part is moot now.

This just leaves the part for observability in this ticket.

Mentioned in SAL (#wikimedia-cloud) [2025-10-01T19:44:13Z] <mutante> deleting instance mailman-puppetserver-1 with expired CA and no users - T402889

LSobanski claimed this task.

The certificate renewal was solved by @Dzahn, I'll follow up on the alerting component separately.