Renew GlobalSign Unified in 2018
Closed, ResolvedPublic

Description

Our big GlobalSign unified certificate expires at 2018-11-22 07:59, which means it's time to get the renewal going soon!

The basic requirements from last time around:

  • We want only 1yr validity, no extended "free" deals of significant length or anything.
  • We want to generate the new public certs using fresh privates, not re-use existing ones.
  • RSA + ECDSA pair certs (no-cost re-issuance, to support certain legacy UAs that still require RSA)
  • Embedded SCTs (I believe this should be a given now, whereas it was a special request last time around)
  • We should aim for a minimum 5-day client clock skew window per stats from T196248 for the big unified. We have to watch clock skew in both directions, so e.g. if we get a start date that's 10 days before the old one's end-date, and switch exactly at the middle-point, we'll have 5 days of clock skew protection in both directions. Preferably we give a little more headroom than that, so that we don't have to switch at the exact middle point to meet this.
  • If we can get the new one's end date to be even earlier than the last (earlier than Nov 22), we can increase our inter-vendor spread, which currently sits at 63 days.

Additionally, I think we're going to take a last-minute look at whether it's reasonable to ask for the Must-Staple extension this time around ( T204987 ), or whether we should defer on that for now, assuming GlobalSign can do it and we decide we're operationally ready for it. I'm still on the fence on this until I re-examine everything about our current OCSP setup and latest outside info. At the very least, we probably want to resolve issues raised in T163541 first, and if we can't be comfortable with it yet this year, it's not the end of the world.

Note: the companion redundant Digicert unified doesn't expire until 2019-01-24, and will be tasked separately at a later date. Not everything is even yet decided about exactly how we'll handle that one this time around.

BBlack created this task.Oct 11 2018, 8:42 PM
BBlack triaged this task as Normal priority.
Restricted Application added a project: Operations. · View Herald TranscriptOct 11 2018, 8:42 PM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Dzahn added a subscriber: Dzahn.Oct 11 2018, 11:47 PM
revi added a subscriber: revi.Oct 12 2018, 3:52 AM

Also remembering there's some stats @Krinkle mentioned in T196248 about clock skews and users from some Google research. The TL;DR there was 24 hours gives us 93.3% , and 5 days is the sweet spot giving us 99.6%, after which it takes a lot more time to get significant gains. Will update the minimum/desirable timings above accordingly.

BBlack updated the task description. (Show Details)Oct 12 2018, 11:15 AM
ema moved this task from Triage to TLS on the Traffic board.Oct 22 2018, 8:43 AM
BBlack added a subtask: Unknown Object (Task).Nov 2 2018, 7:00 PM

Must-Staple didn't turn out to be a realistic option for GlobalSign, we'll look at it again later/elsewhere!

Change 472578 had a related patch set uploaded (by BBlack; owner: BBlack):
[operations/puppet@production] Add globalsign 2018 unified certs

https://gerrit.wikimedia.org/r/472578

Change 472579 had a related patch set uploaded (by BBlack; owner: BBlack):
[operations/puppet@production] Deploy inactive globalsign 2018 unified certs

https://gerrit.wikimedia.org/r/472579

The dual RSA+ECDSA certs above have:

Not Before: Nov  8 21:37:02 2018 GMT
Not Before: Nov  8 21:21:04 2018 GMT

Which leaves us plenty of room for clock skew on the deploy (whew!).

If we assume we want 5 days in both directions, our window for swapping the new ones into active is anywhere between ~2018-11-13 22:00 and ~2018-11-17 08:00 (all times UTC!), which gives us most of the next work week. We'll do some final rechecking and validation in the meantime! The commits merging shortly just move the cert into place and begin OCSP stapling, but don't actually make it live for users.

Change 472578 merged by BBlack:
[operations/puppet@production] Add globalsign 2018 unified certs

https://gerrit.wikimedia.org/r/472578

Change 472579 merged by BBlack:
[operations/puppet@production] Deploy inactive globalsign 2018 unified certs

https://gerrit.wikimedia.org/r/472579

Seems to be testing fine on https://pinkunicorn.wikimedia.org/ , and the pre-deployment to all caches hosts and OCSP Stapling looks fine too.

The skew window for the transition is 2018-11-08T21:21:04 (notbefore in new set) -> 2018-11-22T07:59:59 (notafter in old set). So if we want a minimum of 5 full days on either side of the clock skew issue, our window for switching is anywhere within: 2018-11-13T21:21:04 -> 2018-11-17T07:59:59 (later today through Saturday morning EU time).

I'm going to tentatively set the switch time for 10AM tomorrow morning my time, which is:
UTC: 2018-11-14T16:00:00
California: 2018-11-14T08:00:00

Change 473211 had a related patch set uploaded (by BBlack; owner: BBlack):
[operations/puppet@production] Switch unified cert to globalsign-2018 at US edges

https://gerrit.wikimedia.org/r/473211

Mentioned in SAL (#wikimedia-operations) [2018-11-14T16:09:29Z] <bblack> starting replacement of GlobalSign unified TLS cert at US edges (affects all public TLS termination for US traffic edges) - T206804

Mentioned in SAL (#wikimedia-operations) [2018-11-14T16:10:55Z] <bblack> disabling puppet as precaution on all caches (cumin A:cp) - T206804

Change 473211 merged by BBlack:
[operations/puppet@production] Switch unified cert to globalsign-2018 at US edges

https://gerrit.wikimedia.org/r/473211

Mentioned in SAL (#wikimedia-operations) [2018-11-14T16:33:20Z] <bblack> [Done] replacement of GlobalSign unified TLS cert at US edges complete - T206804

BBlack mentioned this in Unknown Object (Task).Wed, Nov 14, 5:16 PM
BBlack closed subtask Unknown Object (Task) as Resolved.Wed, Nov 14, 5:19 PM
BBlack closed this task as Resolved.

Change 475489 had a related patch set uploaded (by BBlack; owner: BBlack):
[operations/puppet@production] Remove old globalsign unified certs from config

https://gerrit.wikimedia.org/r/475489

Change 475490 had a related patch set uploaded (by BBlack; owner: BBlack):
[operations/puppet@production] Remove old globalsign unified cert files

https://gerrit.wikimedia.org/r/475490

Change 475489 merged by BBlack:
[operations/puppet@production] Remove old globalsign unified certs from config

https://gerrit.wikimedia.org/r/475489

Change 475490 merged by BBlack:
[operations/puppet@production] Remove old globalsign unified cert files

https://gerrit.wikimedia.org/r/475490

Mentioned in SAL (#wikimedia-operations) [2018-11-23T16:53:43Z] <bblack> cleaned up remnants of globalsign-2017 unified cert (OCSP cache/config, unmanaged cert files, etc) on all cpNNNN - T206804