This was originally on our long-term radar as part of the (forever-stalled and in-discussion!) [H]PKP ticket: T92002 . The recent GlobalSign issue has highlighted the need to break this out as a higher-priority action we should take on independently of that.
We need to obtain our "unified" cert from two vendors with the same SAN set, in both ECC and RSA forms. Ideally the annual renewal time for each should be at least slightly offset (~1 month?). We'll puppetize the deployment of both keys to all of the cache clusters, including live OCSP staple fetching for both.
Only one will be in active use at any given time. The intent is that, if we run into another rare operational issue affecting a the active cert vendor, with a very trivial operational change (just a 2-3 line nginx config change + nginx reload) we can switch our infrastructure to the other cert vendor without blocking on anything complex or external to operations or the organization.
It's tempting to consider keeping both certs live on different clusters/hosts, but this would probably not be worth the confusion it could cause, and may have subtle negative effects that are difficult to predict (e.g. perf, HTTP/2 coalesce, etc).
Vendor selection is out of scope in this ticket, but essentially we need to select two separate, independent vendors (no shared trust chain) we can trust which meet all of our operational needs (especially: easy issue of large SAN lists with multiple wildcards and dual-issue of ECC+RSA certs).
What is in scope for this ticket is making the actual changes necessary to deploy the dual-vendor keys after we've purchased them, and documenting a simple procedure for switching them.