As this was the ultimate root cause, of the outage, we should make sure there aren't any old CA certs floating around that could confuse things again.
Description
Description
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | bd808 | T232536 Toolforge Kubernetes internal API down, causing `webservice` and other tooling to fail | |||
Resolved | Andrew | T232772 Audit tools project puppet CA certs to ensure that they are all consistent |
Event Timeline
Comment Actions
The related cert for the outage was on the server itself in this place https://phabricator.wikimedia.org/T148929#2817428
Comment Actions
I just did a quick check (yay cumin!). You might want to check tools-elastic-01.tools.eqiad.wmflabs's /var/lib/puppet/client/ssl/certs/ca.pem. Other than that, that file name and also more importantly /var/lib/puppet/ssl/certs/ca.pem and /etc/ssl/certs/Puppet_Internal_CA.pem look consistent.
Comment Actions
I fixed the file that @Krenair mentioned and confirmed that /var/lib/puppet/ssl/certs/ca.pem == /etc/ssl/certs/Puppet_Internal_CA.pem ==/var/lib/puppet/client/ssl/certs/ca.pem on all hosts in tools.