Page MenuHomePhabricator

ca-certificates 20200601~deb10u1 lacks GeoTrust Global CA certificate required for APNS
Open, MediumPublic

Description

Current status: Connecting to APNS requires the GeoTrust Global CA certificate which was removed from ca-certificates in recently published version 20200601~deb10u1. We currently have the package pinned to previous version 20190110 in the Blubberfile. A change reverting the removal of the certificate was merged but a new version has not yet been published. We should monitor the upstream bug and use the newest version when it's available. If it's not yet published by the time we need to go to production, we should find an alternate solution, possibly https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=962596#43.


Original bug:

I've updated the push notifications service in the Beta Cluster (on deployment-push-notifications01) with the commit adding APNS support, and configured it with the push-toolforge.p12 certificate and production: true for testing with the push-notifications-helper tool as described in src/outgoing/apns/readme.md.

Problem: Requests to APNS fail with the following response:

{
  "sent": [],
  "failed": [
    {
      "device": <device token>,
      "error": {
        "jse_shortmsg": "stream ended unexpectedly",
        "jse_info": {},
        "message": "stream ended unexpectedly"
      }
    }
  ]
}

The Beta Cluster push service can be tested locally by SSH'ing into deployment-push-notifications01 and forwarding port 8900:

ssh -L 8900:localhost:8900 deployment-push-notifications01.deployment-prep.eqiad1.wikimedia.cloud

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJun 11 2020, 4:50 PM
Mholloway triaged this task as High priority.Jun 11 2020, 4:51 PM

I don't think going through an HTTP proxy is required to reach the outside world from the Beta Cluster, but just to check, I configured the service to proxy HTTP requests through deployment-urldownloader02, but the result was the same.

Have you tried using the same device token locally with the same credentials? First thing that comes in mind is that all requests to APNS are HTTP/2 so if we go through a proxy, it should support it.

Yes, the same request succeeds when running the service locally with the same credential.

I got some debug logging output. Looks like openssl is having an issue with the cert. I'm not sure why it's happening in the Beta Cluster but not locally.

Jun 12 12:41:39 deployment-push-notifications01 docker[8495]: 2020-06-12T12:41:39.389Z apn Request ended with status null and responseData:
Jun 12 12:41:39 deployment-push-notifications01 docker[8495]: 2020-06-12T12:41:39.393Z apn Request error: Error [ERR_HTTP2_STREAM_CANCEL]: The pending stream has been canceled (caused by: unable to get local issuer certificate)
Jun 12 12:41:39 deployment-push-notifications01 docker[8495]: 2020-06-12T12:41:39.394Z apn Session error: Error: unable to get local issuer certificate
Jun 12 12:41:39 deployment-push-notifications01 docker[8495]: 2020-06-12T12:41:39.395Z apn Session closed

For reference, my local openssl version is OpenSSL 1.1.1 11 Sep 2018, and on deployment-push-notifications01 it's OpenSSL 1.1.1d 10 Sep 2019.

Mholloway updated the task description. (Show Details)Jun 12 2020, 1:27 PM
Mholloway updated the task description. (Show Details)Jun 12 2020, 1:34 PM

I got some debug logging output. Looks like openssl is having an issue with the cert. I'm not sure why it's happening in the Beta Cluster but not locally.

Jun 12 12:41:39 deployment-push-notifications01 docker[8495]: 2020-06-12T12:41:39.389Z apn Request ended with status null and responseData:
Jun 12 12:41:39 deployment-push-notifications01 docker[8495]: 2020-06-12T12:41:39.393Z apn Request error: Error [ERR_HTTP2_STREAM_CANCEL]: The pending stream has been canceled (caused by: unable to get local issuer certificate)
Jun 12 12:41:39 deployment-push-notifications01 docker[8495]: 2020-06-12T12:41:39.394Z apn Session error: Error: unable to get local issuer certificate
Jun 12 12:41:39 deployment-push-notifications01 docker[8495]: 2020-06-12T12:41:39.395Z apn Session closed

For reference, my local openssl version is OpenSSL 1.1.1 11 Sep 2018, and on deployment-push-notifications01 it's OpenSSL 1.1.1d 10 Sep 2019.

Doesn't look like APNS cert error, see T250493#6193114 with a similar error, fixed by installing ca-certificates in the pipeline image.

Dryrun mode works fine for me.

The container has ca-certificates, though, as a result of your patch.

runuser@7a7b6642c4b0:/srv/service$ apt list ca-certificates
Listing... Done
ca-certificates/now 20200601~deb10u1 all [installed,local]

I run the following on the local docker env and on beta:

openssl s_client -verify 2 -connect api.sandbox.push.apple.com:443

On local env I got:

....
Verification: OK
...

and on beta:

...
Verification error: unable to get local issuer certificate
...

so its not an app specific issue

I got some debug logging output. Looks like openssl is having an issue with the cert. I'm not sure why it's happening in the Beta Cluster but not locally.

Jun 12 12:41:39 deployment-push-notifications01 docker[8495]: 2020-06-12T12:41:39.389Z apn Request ended with status null and responseData:
Jun 12 12:41:39 deployment-push-notifications01 docker[8495]: 2020-06-12T12:41:39.393Z apn Request error: Error [ERR_HTTP2_STREAM_CANCEL]: The pending stream has been canceled (caused by: unable to get local issuer certificate)
Jun 12 12:41:39 deployment-push-notifications01 docker[8495]: 2020-06-12T12:41:39.394Z apn Session error: Error: unable to get local issuer certificate
Jun 12 12:41:39 deployment-push-notifications01 docker[8495]: 2020-06-12T12:41:39.395Z apn Session closed

For reference, my local openssl version is OpenSSL 1.1.1 11 Sep 2018, and on deployment-push-notifications01 it's OpenSSL 1.1.1d 10 Sep 2019.

Doesn't look like APNS cert error, see T250493#6193114 with a similar error, fixed by installing ca-certificates in the pipeline image.

That said, does it make sense to change the base image for our local env so we avoid this type of issues in the future ?

OK, I think I've found the issue: it appears that ca-certificates version 20190110 contains the GeoTrust Global CA certificate we need to connect to APNS, but version 20200601~deb10u1 (installed by default) does not. The command openssl s_client -verify 2 -connect api.sandbox.push.apple.com:443 finds the full certificate chain with the former, but fails to do so with the latter. We can fix this for now by specifically requesting ca-certificates=20190110 in the Blubberfile. (I'll add akosiaris as a sanity check on that solution.)

Change 605277 had a related patch set uploaded (by Mholloway; owner: Michael Holloway):
[mediawiki/services/push-notifications@master] Blubber: Pin ca-certificates to version 20190110

https://gerrit.wikimedia.org/r/605277

Debian bug about this issue: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=962596
And the initial bug leading to the removal of the certificate from ca-certificates: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=911289

As it happens, the change has been reverted and it appears a new version will be released sometime soon: https://salsa.debian.org/debian/ca-certificates/-/commit/679daf6e9bf6fcdcb574b8029297d24836fafde0

All this being the case, I think we're safe pinning the version for the time being as I've done in the patch.

Change 605277 merged by jenkins-bot:
[mediawiki/services/push-notifications@master] Blubber: Pin ca-certificates to version 20190110

https://gerrit.wikimedia.org/r/605277

Mholloway added a comment.EditedJun 15 2020, 2:00 PM

OK, APNS requests now succeed on the Beta Cluster. I'll keep this task open for tracking the upstream issue, since akosiaris commented that this is fine for beta but we shouldn't go to production with the package version pinned in this way.

If the ca-certificates update isn't published in the next week or so then maybe we're better off directly grabbing and installing the needed certificate as described at https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=962596#43.

Mholloway renamed this task from [BETA] APNS requests fail: "stream ended unexpectedly" to ca-certificates 20200601~deb10u1 lacks GeoTrust Global CA certificate required for APNS.Jun 15 2020, 2:07 PM
Mholloway lowered the priority of this task from High to Medium.
Mholloway updated the task description. (Show Details)
Mholloway added a comment.EditedJun 19 2020, 10:24 PM

Apple's docs recommend directly installing the GeoTrust certificate.

IMPORTANT

To establish HTTP/2-based TLS sessions with APNs, you must ensure that a GeoTrust Global CA root certificate is installed on each of your providers. If a provider is running macOS, this root certificate is in the keychain by default. On other systems, this certificate might require explicit installation. You can download this certificate from the GeoTrust Root Certificates website. Here is a direct link to the certificate.

https://developer.apple.com/library/archive/documentation/NetworkingInternet/Conceptual/RemoteNotificationsPG/APNSOverview.html

Apple's docs recommend directly installing the GeoTrust certificate.

Discussed with Alexandros; we're not doing this. We'll wait for the release of the fixed package.

Mholloway updated the task description. (Show Details)Mon, Jul 20, 3:47 PM

@Mholloway, without us knowing when the fixed package will be released, will this hold up the project indefinitely. There seems to be a lack of guidance on this matter amongst the broader community besides following Apple's recommendation of installing it direction from the GeoTrust website. I remember Alexandros was not in favor of this (nor of doing these types of installations from generic sources). But what is the workaround?

Hi @akosiaris, following up on this issue with ca-certificates which we discussed in our meeting about push notifications a few weeks ago. To recap, the most recent release of ca-certificates (20200601~deb10u1) removed the GeoTrust Global CA root certificate which is required to connect to APNS. A new release candidate (20200611) reverting the removal of this certificate was prepared but never released, and the package maintainer has been unresponsive since June 12 (last messages here and here).

At this point it doesn't look like we can safely assume that the fix will be released by the time we want to launch push notifications to production. I believe we have two workaround options:

  1. Pin ca-certificates to the last working version (20190110) (this is what we're currently doing)
  2. Directly download and install the certificate as described at T255169#6241557.

Is either of these acceptable for launching to production? Thank you.

Hi @akosiaris, following up on this issue with ca-certificates which we discussed in our meeting about push notifications a few weeks ago. To recap, the most recent release of ca-certificates (20200601~deb10u1) removed the GeoTrust Global CA root certificate which is required to connect to APNS. A new release candidate (20200611) reverting the removal of this certificate was prepared but never released, and the package maintainer has been unresponsive since June 12 (last messages here and here).

Adding @Muehlenhoff in case he is able to push this a bit.

At this point it doesn't look like we can safely assume that the fix will be released by the time we want to launch push notifications to production. I believe we have two workaround options:

  1. Pin ca-certificates to the last working version (20190110) (this is what we're currently doing)
  2. Directly download and install the certificate as described at T255169#6241557.

Is either of these acceptable for launching to production? Thank you.

I don't like the latter much as handling individual CAs manually is bound to have a problematic update process. Let's see what @Muehlenhoff can add for the upstream bug, but we can keep carrying the version pin listed in 1. for now as a fix (it's also easy to revert when it's fixed).

I don't like the latter much as handling individual CAs manually is bound to have a problematic update process. Let's see what @Muehlenhoff can add for the upstream bug, but we can keep carrying the version pin listed in 1. for now as a fix (it's also easy to revert when it's fixed).

I have no idea what's going on there, the ca-certificates maintainer hasn't followed up further, there was already a ping on that task two weeks ago, so not sure pinging it again would make any difference...

Let's keep the version pin for now and revert when it's finally fixed.