This was originally on our long-term radar as part of the (forever-stalled and in-discussion!) [H]PKP ticket: T92002 . The recent GlobalSign issue has highlighted the need to break this out as a higher-priority action we should take on independently of that.
We need to obtain our "unified" cert from two vendors with the same SAN set, in both ECC and RSA forms. Ideally the annual renewal time for each should be at least slightly offset (~1 month?). We'll puppetize the deployment of both keys to all of the cache clusters, including live OCSP staple fetching for both from everywhere.
Only one will be in active use at any given time. The intent is thatWe'll puppetize such that VendorA's certs are live in one set of datacenters and VendorB's are live in another under normal conditions. With both in active use, we'll be ensured they're both normally working properly on fine details like browser compatibility, OCSP, PKP, if we run into another rare operational issue affecting the active cert vendoretc. By splitting on regions (rather than other arbitrary splits), with a very trivial operational change (just a 2-3 line nginx config change + nginx reload) we can switch our infrastructure to the other cert vendor without blocking on anything complex or external to operations or the organizatione avoid issues with individual clients commonly bouncing between two disparate certs and the effect that may have on performance-related issues.
It's tempting to consider keeping both certs live on different clusters/hosts, but this would probably not be worth the confusion it could cause, and may have subtle negative effects that are difficult to predict (e.g. perf, HTTP/2 coalesceif we run into another rare operational issue affecting one of the active cert vendors, etc)with a very trivial puppet change (just a 2-3 line nginx config change + nginx reload) we can switch to the remaining functional cert at all datacenters.
Vendor selection is out of scope in this ticket, but essentially we need to select two separate, independent vendors (no shared trust chain) we can trust which meet all of our operational needs (especially: easy issue of large SAN lists with multiple wildcards and dual-issue of ECC+RSA certs).
What is in scope for this ticket is making the actual changes necessary to deploy the dual-vendor keys after we've purchased them, and documenting a simple procedure for switching them.