Page MenuHomePhabricator

The certificate for upload.beta.wmflabs.org expired on January 12, 2021.
Closed, ResolvedPublic

Description

https://upload.beta.wmflabs.org/wikipedia/commons/thumb/a/a4/2014-06-18_Rheinwerk_1.jpg/120px-2014-06-18_Rheinwerk_1.jpg

Expected behavior: Being able to connect to the domain with a non-expired certificate
What happen instead: I can connect to the domain, but have to accept the expired certificate first.

Reminds me of T267858.

Event Timeline

Certificates last 3 months so probably similar issues

root@deployment-cache-upload06:/etc/acmecerts/unified/live# openssl x509 -dates -noout -in rsa-2048.crt
notBefore=Jan 12 01:23:09 2021 GMT
notAfter=Apr 12 01:23:09 2021 GMT
root@deployment-cache-upload06:/etc/acmecerts/unified/live# touch /srv/trafficserver/tls/etc/ssl_multicert.config
root@deployment-cache-upload06:/etc/acmecerts/unified/live# systemctl reload trafficserver-tls.service

It should be up & running now.. I'm not really familiar with the cloud puppetization but this doesn't mimic production behaviour

RhinosF1 assigned this task to Vgutierrez.

Fixed per discussion on #wikimedia-traffic at least for another 90 days.

But yeah I guess this should be fixed/monitored better so it doesn't need manual reload.

But yeah I guess this should be fixed/monitored better so it doesn't need manual reload.

Is that worth a dedicated followup ticket under observability and Beta-Cluster-Infrastructure ?

But yeah I guess this should be fixed/monitored better so it doesn't need manual reload.

Is that worth a dedicated followup ticket under observability and Beta-Cluster-Infrastructure ?

I'd say yes.

root@deployment-cache-upload06:/etc/acmecerts/unified/live# openssl x509 -dates -noout -in rsa-2048.crt
notBefore=Jan 12 01:23:09 2021 GMT
notAfter=Apr 12 01:23:09 2021 GMT
root@deployment-cache-upload06:/etc/acmecerts/unified/live# touch /srv/trafficserver/tls/etc/ssl_multicert.config
root@deployment-cache-upload06:/etc/acmecerts/unified/live# systemctl reload trafficserver-tls.service

It should be up & running now.. I'm not really familiar with the cloud puppetization but this doesn't mimic production behaviour

It should be roughly the same, we may have some differing hieradata.

But yeah I guess this should be fixed/monitored better so it doesn't need manual reload.

Is that worth a dedicated followup ticket under observability and Beta-Cluster-Infrastructure ?

I'd say yes.

We have T271778, which missed the upload part but does basically include this issue as the second bullet point.

Mentioned in SAL (#wikimedia-releng) [2021-10-16T16:47:13Z] <Lucas_WMDE> root@deployment-cache-upload06:~# touch /srv/trafficserver/tls/etc/ssl_multicert.config && systemctl reload trafficserver-tls.service # T293070, based on T271808#6739578