Page MenuHomePhabricator

Windows 10 & MacOS Sierra Certificate errors due to GlobalSign
Closed, ResolvedPublic

Description

Incident documentation: https://wikitech.wikimedia.org/wiki/Incident_documentation/20161013-GlobalSign

Appears that MacOS users on Sierra are having certificate errors when using Safari and Chrome
Also appears to affect Microsoft Edge, and IE on Windows 10

The problem is caused by GlobalSign and has been reported here: https://twitter.com/globalsign/status/786505261842247680

If you're looking for an immediate workaround:

MacOS Sierra: install and use Firefox rather than Safari or Chrome
Win10: Opera 12, Firefox

Firefox can be downloaded from https://www.firefox.com/

Screenshots



Details

Related Gerrit Patches:

Event Timeline

Zppix created this task.Oct 13 2016, 3:40 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptOct 13 2016, 3:40 PM
Zppix updated the task description. (Show Details)Oct 13 2016, 3:42 PM

The workaround for now is to use Firefox, as it has its own TLS stack different from the OS one.

jcrespo triaged this task as Unbreak Now! priority.Oct 13 2016, 3:43 PM
Restricted Application added subscribers: Jay8g, Luke081515, TerraCodes. · View Herald TranscriptOct 13 2016, 3:43 PM
Zppix added a comment.Oct 13 2016, 3:47 PM

Doesn't appear to affect the iOS 8.1 app for Wikipedia.

These has been some of the updates we had recently:

We don't yet understand the full scope or specifics of either the
underlying issue GlobalSign is having, or any impact it's having on
us. I don't think the issue is widespread among our readers, as we're
not seeing a lot of reports, but there may be specific sub-groups
affected (e.g. mac users who used the site during a certain time
window, or only users of certain browsers which accessed us from
behind a certain kind of corporate TLS proxy, etc).
Globalsign's own status page says the issue is ongoing, but only
affects some certificates: https://www.globalsign.com/en/status/ .
Note within that status update, they provide a workaround link for
affected clients (which is something we could send to people reporting
issues): https://support.globalsign.com/customer/portal/articles/1353318-view-and-or-delete-crl-ocsp-cache
The claims GlobalSign is making about the incident, as well as the
reports from other affected GlobalSign customers, is that the issue is
related to OCSP validation of certificates. What we know about our
OCSP situation boils down to:

  1. Browsers all handle OCSP a little differently: some may not check it at all, some may only check it if we fail to do OCSP Stapling in our response, there could be vendor-specific bugs/issues, etc...
  2. It's possible some corporate TLS proxies interfere with OCSP Stapling or direct browser OCSP fetches in some unique way
  3. We staple our OCSP responses, meaning we send the OCSP data in our response directly, which modern browsers should believe, and thus not have to verify OCSP independently from GlobalSign.
  4. The OCSP data we staple is something our servers fetch from GlobalSign and cache locally. We validate that the OCSP response is correct and sane before caching it, and we can keep sending valid cached OCSP stapling for up to 4 days if GlobalSign's servers stop responding correctly.
  5. So far, we haven't seen any indication of any failure in our servers' fetches of OCSP from GlobalSign. Each of our cache servers refreshes the data once an hour, and in aggregate that means we hit their OCSP servers about twice a minute from our infra. So, as far as we know, we're sending good staple data, and modern clients shouldn't be having any OCSP-related issue. When we get more tech details from GlobalSign (or get an actual user with a persistent issue on IRC where we can debug with them), we might get more insight.

-Brandon

We still don't have all the details, but:

  • We think this problem is limited to users of either Safari or Chrome on the latest Mac OS X: Sierra (very new). And possibly not all of them. (Our traffic stats show practically no impact.)
  • Browser Firefox seems unaffected by this, so we might want to recommend affected users to try Firefox while the issue is not yet resolved by GlobalSign.

-Mark

Zppix updated the task description. (Show Details)Oct 13 2016, 3:49 PM
ema moved this task from Triage to TLS on the Traffic board.Oct 13 2016, 3:49 PM
Zppix renamed this task from MacOS Sierra Cert errors to Windows 10 & MacOS Sierra Cert errors.Oct 13 2016, 3:52 PM
Zppix added a comment.Oct 13 2016, 3:55 PM

Clearing CertUlti on edge doesn't fix the issue

ema updated the task description. (Show Details)Oct 13 2016, 3:56 PM
Zppix updated the task description. (Show Details)Oct 13 2016, 3:58 PM

Edge is completely blocking access to WMF sites as shown in screenshot number 2 in the task description

BBlack updated the task description. (Show Details)Oct 13 2016, 3:59 PM
Zppix updated the task description. (Show Details)Oct 13 2016, 4:02 PM
Zppix updated the task description. (Show Details)
Zppix moved this task from Backlog to Certificates on the HTTPS board.
Joe added a subscriber: Joe.Oct 13 2016, 4:08 PM
Zppix updated the task description. (Show Details)Oct 13 2016, 4:10 PM
Zppix updated the task description. (Show Details)
Zppix added a comment.Oct 13 2016, 4:20 PM

An user reports on IRC: earlier today (9-10 a.m. US eastern time) only *.wikimedia.org and wikimediafoundation.org sites were affected by the cert problem, but *.wikipedia.org weren't. now the wikipedias are also affected. (Pulled from wikimedia-operations IRC channel logs )

Zppix added a comment.Oct 13 2016, 4:36 PM

A user in ENWIKI's help irc channel reports the error on Windows 10 Professional latest version, on Chrome - Version 54.0.2840.59 beta-m

Zppix updated the task description. (Show Details)Oct 13 2016, 4:37 PM
Zppix updated the task description. (Show Details)Oct 13 2016, 4:44 PM

Chrome works for me still, seems to be spreading so may affect firefox soon.

Joe added a comment.Oct 13 2016, 4:49 PM

@Paladox: firefox will keep working fine as it uses a different TLS stack from the one provided by the OS.

GlobalSign suggested the following workaround, it's unclear whether it actually works or not: https://support.globalsign.com/customer/portal/articles/1353318-view-and-or-delete-crl-ocsp-cache

No, it doesn't work on macOS Sierra either with Safari nor Chrome.

Zppix added a comment.Oct 13 2016, 4:53 PM

@Pietrodn so firefox doesnt work on mac?

Mholloway updated the task description. (Show Details)Oct 13 2016, 4:54 PM

@Pietrodn so firefox doesnt work on mac?

Wikipedia on Firefox works fine on macOS Sierra. Seems to be the only workaround.
iOS is unaffected, also.

DatGuy added a subscriber: DatGuy.Oct 13 2016, 5:02 PM

<pietrodn> More detailed GlobalSign explanation of the problem https://twitter.com/globalsign/status/786612660397715456

More detailed explanation of the technical problem by GlobalCert:
https://downloads.globalsign.com/acton/fs/blocks/showLandingPage/a/2674/p/p-008f/t/page/fm/0

Dear Valued GlobalSign Customer,
As most of you are aware, we are experiencing an internal process issue (details below) that is impacting your business. While we have identified the root-cause, we deeply apologize for the problems this is causing you and wanted to ensure you that we are actively resolving the issue.
GlobalSign manages several root certificates and for compatibility and browser ubiquity reasons provides several cross-certificates between those roots to maximize the effectiveness across a variety of platforms. As part of a planned exercise to remove some of those links, a cross-certificate linking two roots together was revoked. CRL responses had been operational for 1 week, however an unexpected consequence of providing OCSP responses became apparent this morning, in that some browsers incorrectly inferred that the cross-signed root had revoked intermediates, which was not the case.
GlobalSign has since removed the cross-certificate from the OCSP database and cleared all caches. However, the global nature of CDNs and effectiveness of caching continued to push some of those responses out as far as end users. End users cannot always easily clear their caches, either through lack of knowledge or lack of permission. New users (visitors) are not affected as they will now receive good responses.
The problem will correct itself in 4 days as the cached responses expire, which we know is not ideal. However, in the meantime, GlobalSign will be providing an alternative issuing CA for customers to use instead, issued by a different root which was not affected by the cross that was revoked, but offering the same ubiquity and does not require to reissue the certificate itself.
We are currently working on the detailed instructions to help you resolve the issue and will communicate those instruction to you shortly.
Thank you for your patience.
Lila Kee
Chief Product Officer
GMO GlobalSign
US +1 603-570-7060 | UK +44 1622 766 766 | EU +32 16 89 1900
www.globalsign.com/en

(if you cannot access that due to SSL errors: https://twitter.com/globalsign/status/786612660397715456 )

Change 315705 had a related patch set uploaded (by BBlack):
GlobalSign G2 intermediate, signed by R3

https://gerrit.wikimedia.org/r/315705

Legoktm updated the task description. (Show Details)Oct 13 2016, 5:13 PM

Change 315705 merged by BBlack:
GlobalSign G2 intermediate, signed by R3

https://gerrit.wikimedia.org/r/315705

Working workaround for Chrome and Safari on macOS Sierra: http://apple.stackexchange.com/a/257112/33925

$ sqlite3 ~/Library/Keychains/*/ocspcache.sqlite3 'DELETE FROM responses WHERE responderURI LIKE "%http://%.globalsign.com/%";'

And restart the browser after that.

Mentioned in SAL (#wikimedia-operations) [2016-10-13T17:33:32Z] <bblack> pushing new intermediate to caches - T148045

BBlack added a subscriber: BBlack.Oct 13 2016, 5:37 PM

We've received an updated intermediate cert from GlobalSign that's compatible with our existing end-certs and supposedly fixes the issue. It's deployed now, please re-test on the primary production wiki domains. (it may not yet be fixed for some of our one-off tech infrastructure sites).

We're working through the other minor one-off cert issues now on smaller (mostly for technical folks sites), I'm breaking off a separate subtask about the issues there...

hashar added a subscriber: hashar.Oct 13 2016, 7:27 PM

OCG on ocg1001 ocg1002 ocg1003, started yielding CERT_UNTRUSTED error at 17:30 UTC

One can monitor it via Grafana backend success/error https://grafana.wikimedia.org/dashboard/db/ocg?panelId=8&fullscreen

Or in logstash looking for CERT_UNTRUSTED https://logstash.wikimedia.org/goto/9c1966784d2364340b6159a52d72f586

Nuria added a comment.Oct 13 2016, 7:54 PM

Will get numbers for Mac OS requests on Chrome and Safari per hour for the last 3 days to quantify impact, let me know if you no longer need those. Selects are running now and probably will take couple hours.

@Nuria - it would have to be specifically for MacOS Sierra (the new version that came out less than a month ago). There were other UAs affected as well, such as Edge on Win10, and possibly Safari on iOS 10, and some reports about OperaMini mobile browsers as well. In some of these cases we can't really quantify what percentage of those UAs it affected, if it was intermittent, etc. But in any case, any numbers we generate on it now are just to know the impact after-the-fact, as we're already mitigating the problem with an upstream fixup from GlobalSign.

Aklapper renamed this task from Windows 10 & MacOS Sierra Cert errors to Windows 10 & MacOS Sierra Certificate errors due to GlobalSign.Oct 14 2016, 10:56 AM

Resolving this. The mitigation deployed yesterday (alternate intermediate->root chain) seems to have worked, and the incident documtnation is up at: https://wikitech.wikimedia.org/wiki/Incident_documentation/20161013-GlobalSign

faidon closed this task as Resolved.Oct 14 2016, 1:28 PM
faidon assigned this task to BBlack.
Aklapper updated the task description. (Show Details)

Change 322683 had a related patch set uploaded (by BBlack):
Revert "GlobalSign G2 intermediate, signed by R3"

https://gerrit.wikimedia.org/r/322683

Change 322683 merged by BBlack:
Revert "GlobalSign G2 intermediate, signed by R3"

https://gerrit.wikimedia.org/r/322683

hashar removed a subscriber: hashar.Nov 22 2016, 7:21 AM

Change 322913 had a related patch set uploaded (by BBlack):
Revert "Revert "GlobalSign G2 intermediate, signed by R3""

https://gerrit.wikimedia.org/r/322913

Change 322913 merged by BBlack:
Revert "Revert "GlobalSign G2 intermediate, signed by R3""

https://gerrit.wikimedia.org/r/322913