|Resolved||bd808||T273956 acme-chief sometimes doesn't refresh certificates because it ignores SIGHUP|
|Open||None||T293585 [epic] The SSL certificate for Beta cluster domains fails to properly renew & deploy|
|Resolved||BUG REPORT||Krenair||T257968 Certificate for *.beta.wmflabs.org has expired (July 2020)|
|Resolved||Vgutierrez||T259338 do not generate metadata for parts that aren't allowed|
it's UBN because beta is down and this task is the beta project, not due to perceived security risk (it's only beta)
initial glance: certs on the box look fine:
root@deployment-cache-text06:/etc/acmecerts/unified/live# openssl x509 -in /etc/acmecerts/unified/live/rsa-2048.chained.crt -noout -text | grep After Not After : Sep 14 05:29:00 2020 GMT root@deployment-cache-text06:/etc/acmecerts/unified/live# openssl x509 -in /etc/acmecerts/unified/live/ec-prime256v1.chained.crt -noout -text | grep After Not After : Sep 14 05:29:37 2020 GMT
puppet has a couple of errors due to OOM, also some acme-chief interaction problems but I'm not sure why that would matter if the new certs are on the box
the immediate problem is solved by me manually doing the cert reload (something like touch /srv/trafficserver/tls/etc/ssl_multicert.config && /bin/systemctl reload trafficserver except there are two different ssl_multicert.config files on the system and two different trafficserver services)
@Vgutierrez: I'm guessing puppet had failed to run the reload exec itself due to the errors connecting to acme-chief (Error 400 on SERVER: part must be in ['ec-prime256v1.crt', 'ec-prime256v1.chain.crt', 'ec-prime256v1.chained.crt', 'ec-prime256v1.key', 'ec-prime256v1.ocsp', 'rsa-2048.crt', 'rsa-2048.chain.crt', 'rsa-2048.chained.crt', 'rsa-2048.key', 'rsa-2048.ocsp'] from puppet and requests like /puppet/v3/file_content/acmedata/mx/bfcd4752e6b346289533bcb6934671a2/rsa-2048.crt.key?environment=production& showing up in the uwsgi-acme-chief logs) - it had new puppet classes and was making the new .crt.key CERTIFICATE_TYPE calls to acme-chief, and the acme-chief instance had v0.26 installed, but the uwsgi-acme-chief service on the acme-chief box had not been restarted. Wonder if we should automatically restart uwsgi-acme-chief on upgrading the acme-chief package somehow (puppet?)
I think we have two bugs here:
- The API service must be restarted as well after an acme-chief upgrade.
- The API service shouldn't list non allowed files, I'd suspect that dropping a file on the cert directory would break puppet on the acme-chief clients right now.
I'm currently on vacations, I'll handle those issues next week