Page MenuHomePhabricator

Monitor internal CA expirations
Closed, DeclinedPublic

Description

Yesterday we had the WMF CA 2014-2017 expire without any kind of warning, causing the handful of certificates issued by it to expire, which in turn cascaded into a widespread monitoring tools and WMCS outage. We should monitor for the expiration of that CA, and warn sufficiently in advance in case of expiry.

Side note: I renewed the CA for another 3 years (2017-2020) rather than e.g. 10 on purpose, to make sure that these CA refreshing procedures are exercised often. I'd be inclined to make it even shorter.

While at it, we should also make sure other internal CAs are monitored as well. The Puppet CA, which is used more as a general purpose CA these days, also immediately to mind.

Event Timeline

faidon changed the task status from Open to Stalled.Aug 21 2017, 3:13 PM

Setting to stalled until we decide what to actually do with the internal CA, as we're considering dropping it entirely in favour of other options.

faidon added a subscriber: Dzahn.

Setting to stalled until we decide what to actually do with the internal CA, as we're considering dropping it entirely in favour of other options.

@akosiaris / @faidon: Has this situation somehow changed by resolved T133717: Letsencrypt all the prod things we can - planning / T194962: Create and deploy a centralized letsencrypt service / Acme-chief (though I'm not sure if that also touched CA monitoring at all)?
Asking as tasks shouldn't remain stalled for too long.

Setting to stalled until we decide what to actually do with the internal CA, as we're considering dropping it entirely in favour of other options.

@akosiaris / @faidon: Has this situation somehow changed by resolved T133717: Letsencrypt all the prod things we can - planning / T194962: Create and deploy a centralized letsencrypt service / Acme-chief (though I'm not sure if that also touched CA monitoring at all)?

Nope. All of those all for the public facing services, the internal CA was also for some non public facing ones.

That being said, the CA that was the trigger for this task, has expired on Jul 18th, but this time around caused no issues as very few certificates were issued by it and we seem to have moved away from using it. So, per T171157#3538015 , I think we can close this as Declined.

Asking as tasks shouldn't remain stalled for too long.