Page MenuHomePhabricator

ATS SSL session cache doesn't work
Closed, ResolvedPublic

Description

ATS SSL session cache is completely broken (443 is ATS and 4443 nginx):

vgutierrez@cp5001:~$ openssl s_client -connect 127.0.0.1:443 -reconnect < /dev/null 2>1 |egrep -i "reconnect|reused"
drop connection and then reconnect
drop connection and then reconnect
drop connection and then reconnect
drop connection and then reconnect
drop connection and then reconnect
vgutierrez@cp5001:~$ openssl s_client -connect 127.0.0.1:4443 -reconnect < /dev/null 2>1 |egrep -i "reconnect|reused"
drop connection and then reconnect
Reused, TLSv1.2, Cipher is ECDHE-ECDSA-CHACHA20-POLY1305
drop connection and then reconnect
Reused, TLSv1.2, Cipher is ECDHE-ECDSA-CHACHA20-POLY1305
drop connection and then reconnect
Reused, TLSv1.2, Cipher is ECDHE-ECDSA-CHACHA20-POLY1305
drop connection and then reconnect
Reused, TLSv1.2, Cipher is ECDHE-ECDSA-CHACHA20-POLY1305
drop connection and then reconnect
Reused, TLSv1.2, Cipher is ECDHE-ECDSA-CHACHA20-POLY1305

Current session cache settings:

vgutierrez@cp5001:~$ sudo -i traffic_ctl --run-root=/srv/trafficserver/tls config match session_cache
NOTE: using command line path as RUNROOT
proxy.config.ssl.session_cache: 2
proxy.config.ssl.session_cache.auto_clear: 1
proxy.config.ssl.session_cache.size: 512000
proxy.config.ssl.session_cache.num_buckets: 32768
proxy.config.ssl.session_cache.skip_cache_on_bucket_contention: 0
proxy.config.ssl.session_cache.timeout: 0

Note that timeout should be ignored when proxy.config.ssl.session_cache: 2 but still if wouldn't be ignored, OpenSSL should set it for us to 300 seconds by default, but it doesn't matter.. debugging shows that exactly the same happens even with timeout set to 30000

I can reproduce this behaviour using the internal OpenSSL session cache with ATS

Event Timeline

Vgutierrez triaged this task as High priority.Sep 12 2019, 10:58 AM
Vgutierrez created this task.

Enabling the session cache debug on a local instance shows this:

willikins:~ vgutierrez$ docker logs -f ats_ats_1 |fgrep timeout
[E. Mgmt] log ==> [TrafficManager] using root directory '/usr'
[Sep 12 11:00:15.807] {0x2ba5dff86080} DEBUG: <SSLUtils.cc:1600 (SSLInitServerContext)> (ssl.session_cache) ssl context=0x234dbc0: using session cache options, enabled=2, size=102400, num_buckets=256, skip_on_contention=0, timeout=30000, auto_clear=0
^C
willikins:~ vgutierrez$ docker logs -f ats_ats_1 |fgrep 4C2F274AD6F44519402A6B02E4867DD22A02BFB53067A5F505006367F7FB2CB1
[E. Mgmt] log ==> [TrafficManager] using root directory '/usr'
[Sep 12 11:01:14.215] {0x2ba5e78d7700} DEBUG: <SSLUtils.cc:285 (ssl_new_cached_session)> (ssl.session_cache.insert) ssl_new_cached_session session '4C2F274AD6F44519402A6B02E4867DD22A02BFB53067A5F505006367F7FB2CB1' and context 0x234dbc0
[Sep 12 11:01:14.215] {0x2ba5e78d7700} DEBUG: <SSLSessionCache.cc:108 (insertSession)> (ssl.session_cache.insert) SessionCache using bucket 5 (0x2347398): Inserting session '4C2F274AD6F44519402A6B02E4867DD22A02BFB53067A5F505006367F7FB2CB1' (hash: B12CFBF767630005).
[Sep 12 11:01:14.215] {0x2ba5e78d7700} DEBUG: <SSLSessionCache.cc:127 (insertSession)> (ssl.session_cache) Inserting session '4C2F274AD6F44519402A6B02E4867DD22A02BFB53067A5F505006367F7FB2CB1' to bucket 0x2347398.
[Sep 12 11:01:27.146] {0x2ba5e76d3700} DEBUG: <SSLUtils.cc:241 (ssl_get_cached_session)> (ssl.session_cache.get) ssl_get_cached_session cached session '4C2F274AD6F44519402A6B02E4867DD22A02BFB53067A5F505006367F7FB2CB1' context 0x234dbc0
[Sep 12 11:01:27.146] {0x2ba5e76d3700} DEBUG: <SSLSessionCache.cc:71 (getSession)> (ssl.session_cache.get) SessionCache looking in bucket 5 (0x2347398) for session '4C2F274AD6F44519402A6B02E4867DD22A02BFB53067A5F505006367F7FB2CB1' (hash: B12CFBF767630005).
[Sep 12 11:01:27.146] {0x2ba5e76d3700} DEBUG: <SSLSessionCache.cc:213 (getSession)> (ssl.session_cache) Looking for session with id '4C2F274AD6F44519402A6B02E4867DD22A02BFB53067A5F505006367F7FB2CB1' in bucket 0x2347398
[Sep 12 11:01:27.149] {0x2ba5e76d3700} DEBUG: <SSLUtils.cc:318 (ssl_rm_cached_session)> (ssl.session_cache.remove) ssl_rm_cached_session cached session '4C2F274AD6F44519402A6B02E4867DD22A02BFB53067A5F505006367F7FB2CB1'

even with auto_clear set to 0 and timeout to 30000 seconds the SSL session gets evicted way faster than expected

Vgutierrez updated the task description. (Show Details)Sep 12 2019, 11:05 AM
Vgutierrez moved this task from Triage to TLS on the Traffic board.Sep 12 2019, 11:08 AM

After debugging why the eviction is triggered:

#0  ssl_rm_cached_session (ctx=0x188db10, sess=0x2ad5e802a910) at SSLUtils.cc:304
#1  0x00002ad5d30030f9 in remove_session_lock (ctx=0x188db10, c=0x2ad5e802a910, lck=<optimized out>) at ../ssl/ssl_sess.c:738
#2  0x00002ad5d3003eff in ssl_clear_bad_session (s=s@entry=0x2ad5e8028870) at ../ssl/ssl_sess.c:1032
#3  0x00002ad5d2ffefe6 in SSL_free (s=0x2ad5e8028870) at ../ssl/ssl_lib.c:1145
#4  0x0000000000669128 in SSLNetVConnection::clear (this=0x2ad618020bf0) at SSLNetVConnection.cc:920
#5  0x00000000006692e4 in SSLNetVConnection::free (this=0x2ad618020bf0, t=0x2ad5d8d6c010) at SSLNetVConnection.cc:954
#6  0x000000000068b4d3 in read_signal_and_update (event=104, vc=0x2ad618020bf0) at UnixNetVConnection.cc:102
#7  0x000000000068b2be in read_signal_done (event=25746192, nh=0x2ad5d8d6fcf0, vc=0x2ad618020bf0) at UnixNetVConnection.cc:144
#8  UnixNetVConnection::readSignalDone (this=0x2ad618020bf0, event=25746192, nh=0x2ad5d8d6fcf0) at UnixNetVConnection.cc:1041
#9  0x000000000066729a in SSLNetVConnection::net_read_io (this=0x2ad618020bf0, nh=0x2ad5d8d6fcf0, lthread=<optimized out>) at SSLNetVConnection.cc:696
#10 0x000000000067f6ef in NetHandler::process_ready_list (this=0x2ad5d8d6fcf0) at UnixNet.cc:395
#11 0x000000000067ff8c in NetHandler::waitForActivity (this=<optimized out>, timeout=<optimized out>) at UnixNet.cc:528
#12 0x000000000068006a in non-virtual thunk to NetHandler::waitForActivity(long) ()
#13 0x00000000006ae762 in EThread::execute_regular (this=0x2ad5d8d6c010) at UnixEThread.cc:272
#14 0x00000000006ad369 in spawn_thread_internal (a=0x17f0aa0) at Thread.cc:85
#15 0x00002ad5d39dd4a4 in start_thread (arg=0x2ad5d917d700) at pthread_create.c:456
#16 0x00002ad5d465bd0f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:97

is OpenSSL itself that triggers it via SSL_free() --> ssl_clear_bad_session()

Vgutierrez added a comment.EditedSep 12 2019, 6:36 PM

It looks like the culprit is https://github.com/apache/trafficserver/commit/03734d05e28af8a7b105a0579056c913fb5d1bc5, I've tested a pretty naive fix on 8.0.5-1wm7 and it works:

willikins:~ vgutierrez$ openssl s_client -connect 127.0.0.1:3129 -reconnect < /dev/null 2>&1| egrep -i "reconnect|reused"
drop connection and then reconnect
Reused, TLSv1/SSLv3, Cipher is ECDHE-ECDSA-AES256-GCM-SHA384
drop connection and then reconnect
Reused, TLSv1/SSLv3, Cipher is ECDHE-ECDSA-AES256-GCM-SHA384
drop connection and then reconnect
Reused, TLSv1/SSLv3, Cipher is ECDHE-ECDSA-AES256-GCM-SHA384
drop connection and then reconnect
Reused, TLSv1/SSLv3, Cipher is ECDHE-ECDSA-AES256-GCM-SHA384
drop connection and then reconnect
Reused, TLSv1/SSLv3, Cipher is ECDHE-ECDSA-AES256-GCM-SHA384

Reported and proposed a solution to upstream in https://github.com/apache/trafficserver/pull/5935

Mentioned in SAL (#wikimedia-operations) [2019-09-17T05:56:22Z] <vgutierrez> uploaded trafficserver 8.0.5-1wm7 to apt.wikimedia.org (stretch) - T232298 T232724

Mentioned in SAL (#wikimedia-operations) [2019-09-17T05:57:16Z] <vgutierrez> upgrading ATS to 8.0.5-1wm7 on cp2002 and cp4021 - T232724

Vgutierrez closed this task as Resolved.Tue, Sep 17, 6:02 AM

Issue looks solved after upgrading to 8.0.5-1wm7:

vgutierrez@cp4021:~$ openssl s_client -connect 127.0.0.1:443 -reconnect < /dev/null 2>1 |egrep -i "reconnect|reused"
drop connection and then reconnect
Reused, TLSv1.2, Cipher is ECDHE-ECDSA-CHACHA20-POLY1305
drop connection and then reconnect
Reused, TLSv1.2, Cipher is ECDHE-ECDSA-CHACHA20-POLY1305
drop connection and then reconnect
Reused, TLSv1.2, Cipher is ECDHE-ECDSA-CHACHA20-POLY1305
drop connection and then reconnect
Reused, TLSv1.2, Cipher is ECDHE-ECDSA-CHACHA20-POLY1305
drop connection and then reconnect
Reused, TLSv1.2, Cipher is ECDHE-ECDSA-CHACHA20-POLY1305

Mentioned in SAL (#wikimedia-operations) [2019-09-17T08:37:09Z] <vgutierrez> upgrading ATS to 8.0.5-1wm8 on cp3034 - T231849 T232724