Page MenuHomePhabricator

Some Java clients unable to handle beta cluster TLS
Closed, ResolvedPublic

Description

Lately, the Android app has been unable to connect to the beta cluster because of an apparent SSL certificate issue. The stack trace is listed below.

Here are some things we know so far:

  • The issue started roughly 2-3 weeks ago.
  • It only seems to happen when using the OkHttp library (the de facto standard library for network communication in apps). It does not happen when connecting through Chrome on Android.
  • It happens on the latest devices (Android 9) with the latest updates.
  • I could also reproduce the issue on Windows, but not on Linux or macOS.
javax.net.ssl.SSLHandshakeException: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
        at sun.security.ssl.Alerts.getSSLException(Alerts.java:192)
        at sun.security.ssl.SSLSocketImpl.fatal(SSLSocketImpl.java:1949)
        at sun.security.ssl.Handshaker.fatalSE(Handshaker.java:302)
        at sun.security.ssl.Handshaker.fatalSE(Handshaker.java:296)
        at sun.security.ssl.ClientHandshaker.serverCertificate(ClientHandshaker.java:1509)
        at sun.security.ssl.ClientHandshaker.processMessage(ClientHandshaker.java:216)
        at sun.security.ssl.Handshaker.processLoop(Handshaker.java:979)
        at sun.security.ssl.Handshaker.process_record(Handshaker.java:914)
        at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:1062)
        at sun.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1375)
        at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1403)
        at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1387)
        at okhttp3.internal.connection.RealConnection.connectTls(RealConnection.java:336)
        at okhttp3.internal.connection.RealConnection.establishProtocol(RealConnection.java:300)
        at okhttp3.internal.connection.RealConnection.connect(RealConnection.java:185)
        at okhttp3.internal.connection.ExchangeFinder.findConnection(ExchangeFinder.java:224)
        at okhttp3.internal.connection.ExchangeFinder.findHealthyConnection(ExchangeFinder.java:107)
        at okhttp3.internal.connection.ExchangeFinder.find(ExchangeFinder.java:87)
        at okhttp3.internal.connection.Transmitter.newExchange(Transmitter.java:169)
        at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:41)
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142)
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:117)
        at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:94)
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142)
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:117)
        at okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93)
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142)
        at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:88)
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142)
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:117)
        at okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:221)
        at okhttp3.RealCall.execute(RealCall.java:81)
        at com.dmitrybrant.okhttpssltest.ssltest.main(ssltest.java:15)
Caused by: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
        at sun.security.validator.PKIXValidator.doBuild(PKIXValidator.java:387)
        at sun.security.validator.PKIXValidator.engineValidate(PKIXValidator.java:292)
        at sun.security.validator.Validator.validate(Validator.java:260)
        at sun.security.ssl.X509TrustManagerImpl.validate(X509TrustManagerImpl.java:324)
        at sun.security.ssl.X509TrustManagerImpl.checkTrusted(X509TrustManagerImpl.java:229)
        at sun.security.ssl.X509TrustManagerImpl.checkServerTrusted(X509TrustManagerImpl.java:124)
        at sun.security.ssl.ClientHandshaker.serverCertificate(ClientHandshaker.java:1491)
        ... 28 more
Caused by: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
        at sun.security.provider.certpath.SunCertPathBuilder.build(SunCertPathBuilder.java:141)
        at sun.security.provider.certpath.SunCertPathBuilder.engineBuild(SunCertPathBuilder.java:126)
        at java.security.cert.CertPathBuilder.build(CertPathBuilder.java:280)
        at sun.security.validator.PKIXValidator.doBuild(PKIXValidator.java:382)
        ... 34 more

Event Timeline

I haven't been able to track down anything going wrong on the server-side yet.
<Krenair> beta is testing getting the unified cert via acme-chief
<Krenair> we've discovered that certain java installs do not trust the site anymore
<Krenair> at least we assume it to be caused by this
<Krenair> but that we still don't fully understand what's going on because the same client seems able to talk fine to a prod misc site which also uses certs issued this way, and also some clients complain about different things - android seemed upset about revocation checking

@Dbrant, could you also paste the Android exception involving revocation checks?

Right, here we go:

    javax.net.ssl.SSLHandshakeException: Chain validation failed
        at com.android.org.conscrypt.ConscryptFileDescriptorSocket.startHandshake(ConscryptFileDescriptorSocket.java:229)
        at okhttp3.internal.connection.RealConnection.connectTls(RealConnection.java:336)
        at okhttp3.internal.connection.RealConnection.establishProtocol(RealConnection.java:300)
        at okhttp3.internal.connection.RealConnection.connect(RealConnection.java:185)
        at okhttp3.internal.connection.ExchangeFinder.findConnection(ExchangeFinder.java:224)
        at okhttp3.internal.connection.ExchangeFinder.findHealthyConnection(ExchangeFinder.java:107)
        at okhttp3.internal.connection.ExchangeFinder.find(ExchangeFinder.java:87)
        at okhttp3.internal.connection.Transmitter.newExchange(Transmitter.java:169)
        at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:41)
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142)
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:117)
        at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:94)
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142)
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:117)
        at okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93)
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142)
        at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:88)
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142)
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:117)
        at org.wikipedia.dataclient.okhttp.TestStubInterceptor.intercept(TestStubInterceptor.kt:17)
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142)
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:117)
        at org.wikipedia.dataclient.okhttp.OfflineCacheInterceptor.intercept(OfflineCacheInterceptor.java:84)
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142)
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:117)
        at org.wikipedia.dataclient.okhttp.DefaultMaxStaleRequestInterceptor.intercept(DefaultMaxStaleRequestInterceptor.kt:36)
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142)
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:117)
        at org.wikipedia.dataclient.okhttp.CommonHeaderRequestInterceptor.intercept(CommonHeaderRequestInterceptor.kt:22)
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142)
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:117)
        at org.wikipedia.dataclient.okhttp.StatusResponseInterceptor.intercept(StatusResponseInterceptor.kt:21)
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142)
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:117)
        at org.wikipedia.dataclient.okhttp.UnsuccessfulResponseInterceptor.intercept(UnsuccessfulResponseInterceptor.kt:11)
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142)
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:117)
        at okhttp3.logging.HttpLoggingInterceptor.intercept(HttpLoggingInterceptor.java:223)
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142)
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:117)
2019-04-16 19:44:31.555 19672-19741/org.wikipedia.dev D/org.wikipedia.analytics.EventLoggingService:     at okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:221)
        at okhttp3.RealCall.execute(RealCall.java:81)
        at org.wikipedia.analytics.EventLoggingService.lambda$log$0(EventLoggingService.java:73)
        at org.wikipedia.analytics.-$$Lambda$EventLoggingService$SxwWpT16qHlNDlLVHH7OAyvsBWo.run(Unknown Source:2)
        at io.reactivex.internal.operators.completable.CompletableFromAction.subscribeActual(CompletableFromAction.java:34)
        at io.reactivex.Completable.subscribe(Completable.java:2234)
        at io.reactivex.internal.operators.completable.CompletableSubscribeOn$SubscribeOnObserver.run(CompletableSubscribeOn.java:64)
        at io.reactivex.Scheduler$DisposeTask.run(Scheduler.java:578)
        at io.reactivex.internal.schedulers.ScheduledRunnable.run(ScheduledRunnable.java:66)
        at io.reactivex.internal.schedulers.ScheduledRunnable.call(ScheduledRunnable.java:57)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:301)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1167)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:641)
        at java.lang.Thread.run(Thread.java:764)
     Caused by: java.security.cert.CertificateException: Chain validation failed
        at com.android.org.conscrypt.TrustManagerImpl.verifyChain(TrustManagerImpl.java:707)
        at com.android.org.conscrypt.TrustManagerImpl.checkTrustedRecursive(TrustManagerImpl.java:539)
        at com.android.org.conscrypt.TrustManagerImpl.checkTrustedRecursive(TrustManagerImpl.java:560)
        at com.android.org.conscrypt.TrustManagerImpl.checkTrustedRecursive(TrustManagerImpl.java:628)
        at com.android.org.conscrypt.TrustManagerImpl.checkTrusted(TrustManagerImpl.java:495)
        at com.android.org.conscrypt.TrustManagerImpl.checkTrusted(TrustManagerImpl.java:418)
        at com.android.org.conscrypt.TrustManagerImpl.getTrustedChainForServer(TrustManagerImpl.java:339)
        at android.security.net.config.NetworkSecurityTrustManager.checkServerTrusted(NetworkSecurityTrustManager.java:94)
        at android.security.net.config.RootTrustManager.checkServerTrusted(RootTrustManager.java:88)
        at com.android.org.conscrypt.Platform.checkServerTrusted(Platform.java:208)
        at com.android.org.conscrypt.ConscryptFileDescriptorSocket.verifyCertificateChain(ConscryptFileDescriptorSocket.java:404)
        at com.android.org.conscrypt.NativeCrypto.SSL_do_handshake(Native Method)
        at com.android.org.conscrypt.NativeSsl.doHandshake(NativeSsl.java:375)
        at com.android.org.conscrypt.ConscryptFileDescriptorSocket.startHandshake(ConscryptFileDescriptorSocket.java:224)
        	... 54 more
     Caused by: java.security.cert.CertPathValidatorException: Response is unreliable: its validity interval is out-of-date
        at sun.security.provider.certpath.PKIXMasterCertPathValidator.validate(PKIXMasterCertPathValidator.java:135)
        at sun.security.provider.certpath.PKIXCertPathValidator.validate(PKIXCertPathValidator.java:222)
        at sun.security.provider.certpath.PKIXCertPathValidator.validate(PKIXCertPathValidator.java:140)
        at sun.security.provider.certpath.PKIXCertPathValidator.engineValidate(PKIXCertPathValidator.java:79)
        at java.security.cert.CertPathValidator.validate(CertPathValidator.java:301)
        at com.android.org.conscrypt.TrustManagerImpl.verifyChain(TrustManagerImpl.java:703)
        	... 67 more
     Caused by: java.security.cert.CertPathValidatorException: Response is unreliable: its validity interval is out-of-date
        at sun.security.provider.certpath.OCSPResponse.verify(OCSPResponse.java:619)
        at sun.security.provider.certpath.RevocationChecker.checkOCSP(RevocationChecker.java:709)
        at sun.security.provider.certpath.RevocationChecker.check(RevocationChecker.java:363)
        at sun.security.provider.certpath.RevocationChecker.check(RevocationChecker.java:337)
        at sun.security.provider.certpath.PKIXMasterCertPathValidator.validate(PKIXMasterCertPathValidator.java:125)
        	... 72 more
2019-04-16 19:44:31.556 19672-19741/org.wikipedia.dev D/org.wikipedia.analytics.EventLoggingService: 	Suppressed: java.security.cert.CertPathValidatorException: Could not determine revocation status
        at sun.security.provider.certpath.RevocationChecker.buildToNewKey(RevocationChecker.java:1092)
        at sun.security.provider.certpath.RevocationChecker.verifyWithSeparateSigningKey(RevocationChecker.java:910)
        at sun.security.provider.certpath.RevocationChecker.checkCRLs(RevocationChecker.java:577)
        at sun.security.provider.certpath.RevocationChecker.checkCRLs(RevocationChecker.java:465)
        at sun.security.provider.certpath.RevocationChecker.check(RevocationChecker.java:394)
        		... 74 more

A similar issue is also happening with Commons Android app where the app is unable to connect to beta cluster (https://commons.wikimedia.beta.wmflabs.org).

  • Works well on Android 8 and below.
  • I found issues while using both OkHttp and AbstractHttpClient (apache).

This article might be relevant as it talks about SSL certificate and similar stack traces.

https://developer.android.com/training/articles/security-ssl#java

I have also tried making network requests using a different library (Volley), and still getting the same errors, so we can rule out any issues that might be specific to any library.

I can also confirm that the errors occur only in the newest versions of Android (8.1 and 9.0), and not earlier versions.

Does it happen with plain Java URLConnection then?

Does it happen with plain Java URLConnection then?

Yes it does.

Krenair renamed this task from SSL connections to beta cluster not working to Some Java clients unable to handle beta cluster TLS.Apr 17 2019, 2:23 PM

@Dbrant: @Vgutierrez pointed out the OCSP stapling served by nginx was out of date, we found the puppet manifest had not deployed the hook that causes it to reload nginx when the file changes. Try now, at least on the ones showing the revocation check failure

Change 504571 had a related patch set uploaded (by Alex Monk; owner: Alex Monk):
[operations/puppet@production] tlsproxy: Ensure OCSP stapling nginx reload hook present for acme-chief

https://gerrit.wikimedia.org/r/504571

@Krenair And... that seems to be working! Thanks for digging into it.

Has it had any effect on the other clients?

It's now working correctly on all of my Android devices. It's actually still failing in my mock app on Windows, but I'm suspecting that it really is a root certificate issue specific to Windows (since it was also failing with librenms), and therefore unrelated.
And I've asked the Commons team to report whether they're still having the issue, but I'm hopeful that this is the correct solution.

Excellent. Thanks for reporting this and all the help debugging @Dbrant.

@Dbrant I have tested the latest betaDebug build for the Commons Android app on Android Device with API Level 28 and it now works fine.

Change 504571 merged by Vgutierrez:
[operations/puppet@production] tlsproxy: Ensure OCSP stapling nginx reload hook present for acme-chief

https://gerrit.wikimedia.org/r/504571