Page MenuHomePhabricator

Use OCSP Stapling on misc cluster
Closed, ResolvedPublic

Description

We should do OCSP for the misc cluster as well. There's some quibbles to sort out about whether the planet cert will work correctly with the OCSP infrastructure we have today (the wmfusercontent and wm.o certs should be fine), as I believe stapling is a server-global thing for nginx in practice, even though it seems to be configured there and through puppet as a per-cert thing...

From @Dzahn's original report of direct-OCSP intermittent failure:

Event Timeline

Dzahn created this task.Apr 29 2015, 4:32 AM
Dzahn raised the priority of this task from to Needs Triage.
Dzahn updated the task description. (Show Details)
Dzahn added a subscriber: Dzahn.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptApr 29 2015, 4:32 AM
Dzahn added a comment.EditedApr 29 2015, 4:38 AM

just happened like once or twice and then after reloading it went away and i could use phab again

yea, my settings are not default

Dzahn set Security to None.
Dzahn added a subscriber: BBlack.
Dzahn triaged this task as Medium priority.Apr 29 2015, 4:46 AM

setting priority i don't know. it won't affect everybody and it didn't happen a lot but when it does it looks pretty broken but then i can just hit reload and it is gone. afair bblack has looked into the OCSP server internal error thing before.

What you're seeing here is an intermittent failure of GlobalSign's OCSP service (which is hosted via Cloudflare). We get these all the time for production, too. Their service is flaky.

The difference for primary text/upload/mobile/etc production endpoints is that we're fetching these on our own servers and caching them for the users to see (OCSP Stapling), so that user browsers don't have to fetch from GlobalSign/Cloudflare themselves. This also allows us to survive some transient failures of GlobalSign/Cloudflare's OCSP service (which you can see reports of if you search cron emails for "OCSP"). There are icinga alerts that make sure the cron errors don't pile up enough to result in a real user problem.

There's a related ticket to improve our Stapling stuff even further here, FWIW: T93927

I'm going to re-title this ticket and make it about extending our OCSP Stapling to the misc cluster as well...

BBlack renamed this task from OCSP server experienced an internal error to Use OCSP Stapling on misc cluster.Apr 29 2015, 2:00 PM
BBlack updated the task description. (Show Details)

Change 212257 had a related patch set uploaded (by BBlack):
enable OCSP for misc cluster certs T97506

https://gerrit.wikimedia.org/r/212257

Change 212257 merged by BBlack:
enable OCSP for misc cluster certs T97506

https://gerrit.wikimedia.org/r/212257

BBlack closed this task as Resolved.May 20 2015, 7:49 AM
BBlack claimed this task.

Planet cert worked fine (their update lifetime is much longer than GS's), so I went and head and turned this on. Seems to be working for me.

BBlack moved this task from Triage to Done on the Traffic board.May 20 2015, 7:49 AM