Page MenuHomePhabricator

Certificate for *.beta.wmflabs.org has expired (July 2020)
Closed, ResolvedPublicBUG REPORT

Event Timeline

This should really have Unbreak now! priority as users/visitors perceive it as a security issue.

Krenair triaged this task as Unbreak Now! priority.
Krenair subscribed.

looking

it's UBN because beta is down and this task is the beta project, not due to perceived security risk (it's only beta)
initial glance: certs on the box look fine:

root@deployment-cache-text06:/etc/acmecerts/unified/live# openssl x509 -in /etc/acmecerts/unified/live/rsa-2048.chained.crt -noout -text | grep After
            Not After : Sep 14 05:29:00 2020 GMT
root@deployment-cache-text06:/etc/acmecerts/unified/live# openssl x509 -in /etc/acmecerts/unified/live/ec-prime256v1.chained.crt -noout -text | grep After
            Not After : Sep 14 05:29:37 2020 GMT

puppet has a couple of errors due to OOM, also some acme-chief interaction problems but I'm not sure why that would matter if the new certs are on the box

Krenair lowered the priority of this task from Unbreak Now! to High.EditedJul 14 2020, 8:50 PM

the immediate problem is solved by me manually doing the cert reload (something like touch /srv/trafficserver/tls/etc/ssl_multicert.config && /bin/systemctl reload trafficserver except there are two different ssl_multicert.config files on the system and two different trafficserver services)

@Vgutierrez: I'm guessing puppet had failed to run the reload exec itself due to the errors connecting to acme-chief (Error 400 on SERVER: part must be in ['ec-prime256v1.crt', 'ec-prime256v1.chain.crt', 'ec-prime256v1.chained.crt', 'ec-prime256v1.key', 'ec-prime256v1.ocsp', 'rsa-2048.crt', 'rsa-2048.chain.crt', 'rsa-2048.chained.crt', 'rsa-2048.key', 'rsa-2048.ocsp'] from puppet and requests like /puppet/v3/file_content/acmedata/mx/bfcd4752e6b346289533bcb6934671a2/rsa-2048.crt.key?environment=production& showing up in the uwsgi-acme-chief logs) - it had new puppet classes and was making the new .crt.key CERTIFICATE_TYPE calls to acme-chief, and the acme-chief instance had v0.26 installed, but the uwsgi-acme-chief service on the acme-chief box had not been restarted. Wonder if we should automatically restart uwsgi-acme-chief on upgrading the acme-chief package somehow (puppet?)

I think we have two bugs here:

  1. The API service must be restarted as well after an acme-chief upgrade.
  2. The API service shouldn't list non allowed files, I'd suspect that dropping a file on the cert directory would break puppet on the acme-chief clients right now.

I'm currently on vacations, I'll handle those issues next week

I'm still getting the cert error on https://upload.beta.wmflabs.org . Other subdomains, e.g. https://en.wikisource.beta.wmflabs.org , are working fine now.

I'm still getting the cert error on https://upload.beta.wmflabs.org . Other subdomains, e.g. https://en.wikisource.beta.wmflabs.org , are working fine now.

fixed

I think we have two bugs here:

  1. The API service must be restarted as well after an acme-chief upgrade.
  2. The API service shouldn't list non allowed files, I'd suspect that dropping a file on the cert directory would break puppet on the acme-chief clients right now.

I'm currently on vacations, I'll handle those issues next week

so with T259338 solved, issue number 2 is fixed. Regarding the API service restart, I don't think it's desirable to trigger automatic restarts because that could trigger puppet fails across the servers using acme_chief::cert resources on their puppetization.

@Vgutierrez: Can we make it dynamically reload its code somehow? We should probably have another task for this if so, I'm resolving this one

Aklapper renamed this task from Certificate for *.beta.wmflabs.org has expired to Certificate for *.beta.wmflabs.org has expired (July 2020).Oct 18 2021, 6:58 AM