Page MenuHomePhabricator

TLS security review of the Kafka stack
Closed, ResolvedPublic13 Estimated Story Points

Description

After adding a varnishkafka tls-enabled instance on cp1008 (cache::canary) I had an interesting chat with @BBlack about the following points:

  • We clearly care about the client-side's security params because we don't want the client to be fooled into connecting to an illegitimate "broker" and feeding it sensitive data.
  • We don't want to allow illegitimate clients to connect to the broker either (sending fake/confusing data, or as the output of of some kind of proxy after capturing the real client's connection).
  • Neither side should allow negotiation tricks that might allow for a true end-to-end connection between our legit clients+brokers, which has bad security properties that make it more-sniffable than we'd expect.

Currently a Kafka Jumbo broker shows old/insecure algorithms/protocols when trying to connect:

CN=Puppet CA: palladium.eqiad.wmnet
Client Certificate Types: RSA sign, DSA sign, ECDSA sign
Requested Signature Algorithms: ECDSA+SHA512:RSA+SHA512:ECDSA+SHA384:RSA+SHA384:ECDSA+SHA256:RSA+SHA256:DSA+SHA256:ECDSA+SHA224:RSA+SHA224:DSA+SHA224:ECDSA+SHA1:RSA+SHA1:DSA+SHA1
Shared Requested Signature Algorithms: ECDSA+SHA512:RSA+SHA512:ECDSA+SHA384:RSA+SHA384:ECDSA+SHA256:RSA+SHA256:DSA+SHA256:ECDSA+SHA224:RSA+SHA224:DSA+SHA224:ECDSA+SHA1:RSA+SHA1:DSA+SHA1
Peer signing digest: SHA512
Server Temp Key: ECDH, P-256, 256 bits

Ideally, since we control both sides of the communication, we want to force clients to use TLSv2 + ECDHE-ECDSA-AES256-GCM-SHA384 (or a similar combination).

Some notes about things to test/verify:

  • Additional Kafka broker TLS settings like (quoting https://docs.confluent.io/current/kafka/ssl.html):
  • ssl.cipher.suites (Optional). A cipher suite is a named combination of authentication, encryption, MAC and key exchange algorithm used to negotiate the security settings for a network connection using TLS or SSL network protocol.
  • ssl.enabled.protocols=TLSv1.2,TLSv1.1,TLSv1. It should list at least one of the protocols configured on the broker side.
  • Additional librdkafka TLS settings like ssl.cipher.suites. Interestingly, librdkafka does not support any option that forces it to use a specific TLS version. Does this mean that we'd need to follow up upstream? Does ssl.cipher.suites imply a specific version of TLS?
  • Are the TLS implementations used (the Open-JDK one for Kafka and OpenSSL for librdkafka) recent enough to avoid corner cases that could cause downgrades of the TLS session parameters?
  • Anything else?

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
elukey triaged this task as Medium priority.Dec 15 2017, 3:34 PM

we want to force clients to use TLSv2

Did you mean TLSv1.2?

Does ssl.cipher.suites imply a specific version of TLS?

I think not on its own, but as far as I can tell, some ciphers are only supported by certain versions of TLS. If we go with Brandon's suggestion (which I am testing in labs now, it is working just fine), then it looks to me like TLSv1.2 is forced, as ECDHE-ECDSA-AES256-GCM-SHA384 is not supported by other versions:

otto@k3-1:~$ openssl ciphers -s -tls1_2 | grep ECDHE-ECDSA-AES256-GCM-SHA384  | wc -l
1
otto@k3-1:~$ openssl ciphers -s -tls1_1 | grep ECDHE-ECDSA-AES256-GCM-SHA384  | wc -l
0
otto@k3-1:~$ openssl ciphers -s -tls1 | grep ECDHE-ECDSA-AES256-GCM-SHA384  | wc -l
0

In labs, I have on broker:

ssl.enabled.protocols=TLSv1.2
ssl.cipher.suites=TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384

and in varnishkafka.conf:

kafka.ssl.cipher.suites=ECDHE-ECDSA-AES256-GCM-SHA384

And everything seems to work great.

Are the TLS implementations used [...] recent enough

I'm not sure if there are more specific versions than TLSv1.2, and as far as I know TLSv1.2 is the latest.

Patches coming in to configure ^ :)

https://en.wikipedia.org/wiki/There_are_known_knowns

  • Known Knowns:
    • Restrict ciphersuite selection to a strong FS+AEAD option. Best would be x25519+chapoly, but prime256v1+aes256-gcm is acceptable as well and may be the best we can shoot for given implementation restrictions on the Java side. (kafka.ssl.cipher.suites=ECDHE-ECDSA-AES256-GCM-SHA384 on both sides). And yes, this incidentally only works with TLSv1.2 .
  • Known Unknowns
    • Is restricting protocol to TLSv1.2 explicitly worth any gain on top of the above, given librdkafka has no config for it and would need patching? We can at least do it on the Java side of the connection. Need to do a little reading on whether the client-side allowance for lesser versions has any impact that matters security-wise, even if the ciphersuite would supposedly ultimately fail later in the process...
    • The sigalgs lists being negotiated for mutual certificate-based auth seem to include some weak options, e.g. DSA-based options, SHA1-based options, etc. I was under the impression this is something that our usual TLS negotiators (modern browsers and nginx+openssl) restrict a bit further. Is this actually an issue? If so, can we find a way to configure to only allow strong options here? Being able to fake the authentication in one or both directions would allow MITM in spite of strong proto/cipher choices.
  • Unknown Unknowns
    • Should do some quick vetting that librdkafka's use of OpenSSL API is secure. Note it links against libssl1.0, not 1.1 (which is neither here nor there on being secure I guess, but it's notable in that secure use of the library is a little different under each).
    • Should figure out for sure which TLS implementation the server side is using (via Java crypto libs, which AIUI from IRC should be using some version of NSS from Mozilla under its own hood?), and what possible issues it may or may not be up to speed on (e.g. lack of various mitigations for minor insecurities that have cropped up and been addressed in browsers/nginx/openssl).

Is restricting protocol to TLSv1.2 explicitly worth any gain on top of the above, given librdkafka has no config for it and would need patching?

I would say not? If the cipher is restricted to something only supported by TLSv1.2, then the connection to Kafka will fail eventually anyway if something tried to use a lesser TLS version.

The sigalgs lists being negotiated for mutual certificate-based auth seem to include some weak options [...] can we find a way to configure to only allow strong options here?

Ah, my patch didn't link? https://gerrit.wikimedia.org/r/#/c/399700/ This set's TLSv1.2 only at Kafka broker, and also forces both clients and brokers to only allow the ECDHE-ECDSA-AES256-GCM-SHA384 ciphersuite. That should do it, no?

Should figure out for sure which TLS implementation the server side is using

K! Kafka SSLTransportLayer uses javax.net.ssl with SSL Engine, which is part of the standard JSSE implementation provided with Java 8. I'm pretty sure this is not Mozilla NSS/JSS.

and what possible issues it may or may not be up to speed

I don't know how to compare JSSE to other implementations, e.g. openssl. We use OpenJDK via Debian, so we'd be relying on them for security releases. http://openjdk.java.net/groups/security/ says that vulnerabilities should be reported to Oracle.

http://www.oracle.com/technetwork/java/javase/8all-relnotes-2226344.html has release notes for Oracle Java 8, and lists SSL improvements too. https://www.oracle.com/support/assurance/vulnerability-remediation/security-fixing.html describes their process for releasing security patches. I'm not sure how quickly these make it into OpenJDK and Debian. Maybe @MoritzMuehlenhoff has more details here? He's always making Luca restart daemons for JVM security updates. :)

K! Kafka SSLTransportLayer uses javax.net.ssl with SSL Engine, which is part of the standard JSSE implementation provided with Java 8. I'm pretty sure this is not Mozilla NSS/JSS.

OpenJDK uses NSS, but only for the pkcs11 classes. The cryptographic engines are configured via /usr/lib/jvm/java-7-openjdk-amd64/jre/lib/security/java.securiy with the sun.security.provider.Sun being the default

and what possible issues it may or may not be up to speed

I don't know how to compare JSSE to other implementations, e.g. openssl.

They do fix protocol level flaws in their implementation, not as quick as openssl (and the performance is also not as good, there's some older benchmarks comparing the NSS provider against the internal one: http://blog.fuseyism.com/index.php/2012/03/09/openjdk-icedtea-nss/), but I think it's an acceptable level.

We use OpenJDK via Debian, so we'd be relying on them for security releases. http://openjdk.java.net/groups/security/ says that vulnerabilities should be reported to Oracle.

http://www.oracle.com/technetwork/java/javase/8all-relnotes-2226344.html has release notes for Oracle Java 8, and lists SSL improvements too. https://www.oracle.com/support/assurance/vulnerability-remediation/security-fixing.html describes their process for releasing security patches. I'm not sure how quickly these make it into OpenJDK and Debian. Maybe @MoritzMuehlenhoff has more details here? He's always making Luca restart daemons for JVM security updates. :)

Pretty quickly, Oracle Java is closed-source and the respective open source releases managed by Red Hat (OpenJDK and Icedtea, which is the same but uses two names to add confusion) usually follows a few days (and sometimes a few weeks for older branches) later, those are replacing some non-free bits from Oracle Java and make it more usable on modern platforms (e.g. by adding Pulseaudio support). OpenJDK is the default in all distros, but e.g. Red Hat also ships Oracle Java as a supplement for some quirky application which do rely on Oracle Java (mostly when bound by support contracts, it's rarely a software issue).

The sigalgs lists being negotiated for mutual certificate-based auth seem to include some weak options

Ah I just realized what you meant by this, and I didn't answer. I don't think we can restrict this, other than only generating certs with stronger algs. The certs we've generated so far are all EC SECP256R1 AKA prime256v1 AKA NIST P-256.

The sigalgs lists being negotiated for mutual certificate-based auth seem to include some weak options

Ah I just realized what you meant by this, and I didn't answer. I don't think we can restrict this, other than only generating certs with stronger algs. The certs we've generated so far are all EC SECP256R1 AKA prime256v1 AKA NIST P-256.

The question is whether merely allowing weak sigalgs is an issue here when looking from the perspective of men-in-the-middle. If the server and/or client are willing to accept certificates from the other side which appear to be signed by the correct authority, but use a weaker/older sigalg like RSA+SHA1, does that open the door to the MITM being able to set up a semi-transparent proxy using forged certificates in one or both directions?

OO I have done some research!

We can configure a JVM to deny certain algorithms used in a cert chain.

I tested this by generating an RSA 2048 key and certificate. I edited /etc/java-8-openjdk/security/java.security to change

jdk.certpath.disabledAlgorithms=MD2, MD5, SHA1 jdkCA & usage TLSServer, RSA keySize < 1024, DSA keySize < 1024, EC keySize < 224

to

jdk.certpath.disabledAlgorithms=MD2, MD5, SHA1 jdkCA & usage TLSServer, RSA, DSA, EC keySize < 224

Disabling RSA usage. I verified that an RSA key was able to be used with Kafka before I changed the setting, and was not able to be used after.

What algorithms do we want to disable specifically? Should we just restrict usage to EC only?

It should be possible to configure this per JVM process rather than for all JVMs by overriding this in a -Djava.security.properites file, but I haven't been able to make that work yet.

That looks about right (disable all hashes older than SHA256, disable RSA+DSA), although it's hard to suss exactly what the effect of SHA1 jdkCA & usage TLSServer is in that list. Does that mean SHA1 is disabled, except in the cases that it's the root cert of a chain stored in the jdkCA's default store (e.g. list of public CAs)?

Does that mean SHA1 is disabled, except in the cases that it's the root cert of a chain stored in the jdkCA's default store (e.g. list of public CAs)?

Each rule is comma separated. The unchanged line I pasted is the default settings for Java 8. SHA1 jdkCA & usage TLSServer means that when being used as a TLSServer, SHA1 cannot be used with the provided JDK CA. So, no SHA1 certs in the JDK CA chain will be accepted.

It sounds like we want to disable SHA1 more generally too, so I could just remove the jdkCA & usage TLSServer part, and they'd be disabled for all uses:

jdk.certpath.disabledAlgorithms=MD2, MD5, SHA1, RSA, DSA, EC keySize < 224

Ya? Should we just puppetize this for all JVMs on Kafka boxes? I've not yet figured out how to make it work with a single JVM process. Will keep trying though.

Yeah, seems reasonable to just set it system-wide on these systems.

Change 403415 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Set jdk.certpath.disabledAlgorithms in java.security on Kafka brokers

https://gerrit.wikimedia.org/r/403415

Change 403415 merged by Ottomata:
[operations/puppet@production] Set jdk.certpath.disabledAlgorithms in java.security on Kafka brokers

https://gerrit.wikimedia.org/r/403415

Oook, I've set this on all jumbo Kafka brokers. @BBlack anything else?

Here's a Q:

In cergen, I'm generating EC keys using a SECG curve secp256r1, Also called NIST P-256. python crypotgraphy lib says:

Currently cryptography only supports NIST curves, none of which are considered “safe” by the SafeCurves project run by Daniel J. Bernstein and Tanja Lange.

SafeCurves: https://safecurves.cr.yp.to/

Is this a problem?

No, it's not a problem. For certificates, NIST P-256 (aka secp256r1, aka prime256v1, depending on who's talking) is really the best reasonable choice at this time. Eventually, things will move towards Ed25519 and/or Ed448 certificates in the future, which are based on safer curves, but the standards and software stacks (and public CAs, in the public TLS case) aren't ready for it yet. We'll probably see these adopted more-broadly after the transition to TLSv1.3.

On the other hand, X25519 is a useful (but not strictly necessary!) upgrade to the same NIST P-256 curve for ECDHE key exchange purposes, where possible, but that's not a cergen issue, that's a TLS connection negotiation issue (we should probably look at constraining allowable ECDHE curves as well, to either just NIST P-256 or that and X25519, if possible, avoiding the tons of lesser-known and lesser-tested curves available by default in some implementation versions).

Oook, I've set this [restricted certpath algorithms] on all jumbo Kafka brokers.

Welp, something is totally crazy with Puppet CA signed certificates vs ones created by a self signed certificate (via python cryptography).

jdk.certpath.disabledAlgorithms=MD2, MD5, SHA1, RSA, DSA, EC keySize < 22 somehow is not allowing Puppet signed certs, even though they are using EC keys, to be used. Investigating...

Change 403753 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Revert yesterday's change to kafka-jumbo java.security

https://gerrit.wikimedia.org/r/403753

Change 403753 merged by Ottomata:
[operations/puppet@production] Revert yesterday's change to kafka-jumbo java.security

https://gerrit.wikimedia.org/r/403753

Change 403762 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Allow certificates RSA keySize > 2048, Puppet generates certs like these

https://gerrit.wikimedia.org/r/403762

Change 403762 merged by Ottomata:
[operations/puppet@production] Allow certificates RSA keySize > 2048, Puppet generates certs like these

https://gerrit.wikimedia.org/r/403762

Change 403774 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Also disable SHA224

https://gerrit.wikimedia.org/r/403774

Change 403774 merged by Ottomata:
[operations/puppet@production] Also disable SHA224

https://gerrit.wikimedia.org/r/403774

Current status:

kafka-jumbo running with Requested Signature Algorithms: ECDSA+SHA512:RSA+SHA512:ECDSA+SHA384:RSA+SHA384:ECDSA+SHA256:RSA+SHA256 (openssl s_client -connect kafka-jumbo1003.eqiad.wmnet:9093), and varnishkafka-webrequest-jumbo on cp1008 canary is running with librdkafka 0.11.3 built with libssl1.1.

Ottomata updated the task description. (Show Details)
Ottomata changed the point value for this task from 0 to 13.

Meta-update since this is quite stalled out now. I'll try to line up all the explanatory bits here that are affecting process or timeline, in a more-orderly fashion and not in an IRC conversation:

  • We don't want to flip our analytics streams over from being secured by ipsec to being secured by TLS (without ipsec) without a thorough sec review that the TLS configs/libraries prevent reasonable/known attacks, as outlined in some points above.
    • But, it's important to stress that particular points raised above, e.g. a particular observed sigalgs list, are just examples found in a quick check which point to the need for review.
    • The key points that make this different from other TLS deployments here: the server side is using Java's crypto libraries (which may be partly backed by NSS in the case of our particular JVM), and both the client and server sides' TLS configuration capabilities seem limited. Therefore there's no reasonable expectation, without digging deeper, that this is as secure as we'd expect from "standard" TLS between e.g. a modern browser and our nginx terminators.
    • Nobody's actually done a comprehensive review of all the possible issues, yet, AFAIK.
    • So there's not a checklist to run down here of particular issues and then we're done. We're not even at that stage of the process.
  • Without assigning any fault (which could as likely be mine as anyone's!), clearly we (the organization) didn't realize the necessity of, and/or budget time for this review work in completing the goal here.
  • My assumption when the "Kafka TLS for webrequest" topic was raised in the past, which lead to at least my part of not realizing the above, was that we'd turn on TLS using the existing brokers, within the secure ipsec tunneling we already have in place, and then later look at removing ipsec once we're comfortable. Ordering the process in that way removes this review as a blocker for TLS deployment and makes the whole thing less of a critical point-in-time event and more of a smooth transition of security models.
  • The engineering/deployment plan that's actually trying to move forward now is different than I expect above: TLS is enabled for a new set of brokers which do not have ipsec, and the expectation seems to be that we switch from one cluster to the other (and thus from ipsec -> TLS as the sole guarantor of the traffic's privacy) without overlap. This particular plan is what warrants the security review being a hard blocker.
  • We don't have many here who can do this review thoroughly, and we hadn't planned anyone's time to do it, and we're perpetually busy. Over the past ~7 weeks or so since @elukey created this task after the initial IRC conversation, there's been a lot of holidays, vacations, and travel for everyone involved. I (and others) have looked at this issue sporadically when repeatedly prodded on IRC, but I haven't set aside time to dig through all of this thoroughly, and I don't foresee having the time to devote to this for at least another week or two (at best, no gaurantees!). We have a new employee starting next week who will be working in just the right area to do this review as well, but this isn't the kind of thing one does on their first week here, either :)

Probably the fastest path forward for unblocking deployment would be to (as suggested a few weeks ago on IRC) deploy ipsec for the new brokers, which makes this review a non-blocker that can wait for the eventual removal of ipsec protection. Is there any particular reason we can't move forward with that as the plan? Does it get in the way of something else?

Thanks @BBlack, it's at least good to know that we'll need to do the IPSec thing or this will block us for a long while. I'll discuss with @elukey and decide what to do (likely enable IPSec on the new brokers).

Thanks @BBlack, it's at least good to know that we'll need to do the IPSec thing or this will block us for a long while. I'll discuss with @elukey and decide what to do (likely enable IPSec on the new brokers).

It seems the best solution to unblock us and move all our Analytics consumer to the new Kafka Jumbo. I am not sure how difficult/invasive it is for a host to get back to a ipsec-less configuration later on in the future, but probably doable with a brief maintenance. So let's set up ipsec from all the cp hosts to kafka jumbo!

We have a new employee starting next week who will be working in just the right area to do this review as well, but this isn't the kind of thing one does on their first week here, either :)

Understood. let's please make sure this task is prioritized such he can look at it as soon as possible though.

ping @BBlack could we possibly get this review done next quarter?

Yes, we've had this on the discussion list for ops Q4 goals (the elimination of the need for ipsec for caches<->kafka-brokers), and Traffic signed up to guarantee the time for it. Goals not final yet, of course.

@BBlack did this end up being a Q4 goal for traffic team?

I think this ended up being an Analytics Q4 goal? It's not on our goals list, but we agree to alot some time to it in this Q to do our part (which is most of it, confusingly!), and have discussed it internally.

Ok, great! From our side, we're mostly looking on either more TODOs and/or approval to remove IPSec from jumbo + varnishkafkas for this goal, ya?

@BBlack would you mind if I assigned this to someone on your team?

Right now the TLS server allows the client to pick up the curve to use, since j8u121 (8u171-b11-1~deb9u1 is deployed on kafka-jumbo hosts) this can be configured through the system property jdk.tls.namedGroups:

-Djdk.tls.namedGroups="secp256r1" # this would mimic the behaviour of ssl_ciphersuite.rb regarding curves

https://github.com/wikimedia/puppet/blob/production/modules/wmflib/lib/puppet/parser/functions/ssl_ciphersuite.rb#L216

Reference: http://www.oracle.com/technetwork/java/javase/8u121-relnotes-3315208.html / https://bugs.openjdk.java.net/browse/JDK-8148516

Current behaviour:

Curves ordering: client - fallback: no
Accepted curves: sect283k1,sect283r1,sect409k1,sect409r1,sect571k1,sect571r1,secp256k1,secp256r1

Change 433214 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Set jdk.tls.namedGroups=secp256r1 for Kafka TLS

https://gerrit.wikimedia.org/r/433214

Change 433214 merged by Ottomata:
[operations/puppet@production] Set jdk.tls.namedGroups=secp256r1 for Kafka TLS

https://gerrit.wikimedia.org/r/433214

Mentioned in SAL (#wikimedia-operations) [2018-05-16T13:39:59Z] <ottomata> rolling restart of Kafka jumbo brokers to apply jdk.tls.namedGroups=secp256r1 https://phabricator.wikimedia.org/T182993

I've just submitted a PR to librdkafka to be able to control the certificate signature algorithms and the eliptic curves offered by varnishkafka. With the current librdkafka version you can see during the TLS handshake that some insecure signature algorithms are being offered by varnishkafka:

tshark output
Extension: signature_algorithms (len=32)
    Type: signature_algorithms (13)
    Length: 32
    Signature Hash Algorithms Length: 30
    Signature Hash Algorithms (15 algorithms)
        Signature Algorithm: rsa_pkcs1_sha512 (0x0601)
            Signature Hash Algorithm Hash: SHA512 (6)
            Signature Hash Algorithm Signature: RSA (1)
        Signature Algorithm: SHA512 DSA (0x0602)
            Signature Hash Algorithm Hash: SHA512 (6)
            Signature Hash Algorithm Signature: DSA (2)
        Signature Algorithm: ecdsa_secp521r1_sha512 (0x0603)
            Signature Hash Algorithm Hash: SHA512 (6)
            Signature Hash Algorithm Signature: ECDSA (3)
        Signature Algorithm: rsa_pkcs1_sha384 (0x0501)
            Signature Hash Algorithm Hash: SHA384 (5)
            Signature Hash Algorithm Signature: RSA (1)
        Signature Algorithm: SHA384 DSA (0x0502)
            Signature Hash Algorithm Hash: SHA384 (5)
            Signature Hash Algorithm Signature: DSA (2)
        Signature Algorithm: ecdsa_secp384r1_sha384 (0x0503)
            Signature Hash Algorithm Hash: SHA384 (5)
            Signature Hash Algorithm Signature: ECDSA (3)
        Signature Algorithm: rsa_pkcs1_sha256 (0x0401)
            Signature Hash Algorithm Hash: SHA256 (4)
            Signature Hash Algorithm Signature: RSA (1)
        Signature Algorithm: SHA256 DSA (0x0402)
            Signature Hash Algorithm Hash: SHA256 (4)
            Signature Hash Algorithm Signature: DSA (2)
        Signature Algorithm: ecdsa_secp256r1_sha256 (0x0403)
            Signature Hash Algorithm Hash: SHA256 (4)
            Signature Hash Algorithm Signature: ECDSA (3)
        Signature Algorithm: SHA224 RSA (0x0301)
            Signature Hash Algorithm Hash: SHA224 (3)
            Signature Hash Algorithm Signature: RSA (1)
        Signature Algorithm: SHA224 DSA (0x0302)
            Signature Hash Algorithm Hash: SHA224 (3)
            Signature Hash Algorithm Signature: DSA (2)
        Signature Algorithm: SHA224 ECDSA (0x0303)
            Signature Hash Algorithm Hash: SHA224 (3)
            Signature Hash Algorithm Signature: ECDSA (3)
        Signature Algorithm: rsa_pkcs1_sha1 (0x0201)
            Signature Hash Algorithm Hash: SHA1 (2)
            Signature Hash Algorithm Signature: RSA (1)
        Signature Algorithm: SHA1 DSA (0x0202)
            Signature Hash Algorithm Hash: SHA1 (2)
            Signature Hash Algorithm Signature: DSA (2)
        Signature Algorithm: ecdsa_sha1 (0x0203)
            Signature Hash Algorithm Hash: SHA1 (2)
            Signature Hash Algorithm Signature: ECDSA (3)

The TLS ClientHello can be analyzed with tshark with the following filters: -d tcp.port==9093,ssl -Y "ssl.handshake.type == 1" -O ssl

it would be nice to be able to use X25519 curve here, OpenSSL provides support for X25519 since version 1.1.0. Regarding JVM side, support it's scheduled for JVM 11, and it's being tracked on JEP 324.

Hm, ya, sounds like a way off before we get that in Debian then, ya? Is that something that would block removal of IPSec?

Hm, ya, sounds like a way off before we get that in Debian then, ya? Is that something that would block removal of IPSec?

it isn't a stopper for us now, but I just want to track it to be able to use it when it's available.

Reporting a IRC discussion in here. It would be great to make a list of next steps for:

  • remove IPSEC completely between Jumbo and Caching hosts.
  • nice to have (but not mandatory for the above step) that may possibly be implemented in the future.

Also we should list what next steps can be done in parallel, for example reviewing the JVM TLS settings and upgrade librdkafka with Valentin's last patch (and deploy it on all the caching nodes).

@Vgutierrez from what I can tell: the only blocker to removing IPSec is deploying a new version of librdkafka with your patch, and setting the config to restrict the sigalgs for varnishkafka. Is this correct? If so, do you think we can get this done before the end of this month?

I think we can do it :).

BTW, right now we are enforcing AES ciphersuites in our TLS connections, and we are lucky that the JVM default settings enable AES intrinsics... but maybe we could be sure being explicit about these two JVM options:

vgutierrez@kafka-jumbo1001:~$ !65
java -XX:+PrintFlagsFinal -version |grep AES
     bool UseAES                                    = true                                {product}
     bool UseAESIntrinsics                          = true                                {product}

@Ottomata also I'm currently reviewing the TLS implementation on Kafka side, so far so good.

I've just tested a new build of librdkafka (0.11.3-1~bpo8+1+wikimedia2) on cp1008 that includes the new TLS configuration settings.
Additional config:

kafka.ssl.curves.list=P-256
kafka.ssl.sigalgs.list=ECDSA+SHA256

ClientHello during TLS handshake:

Extension: signature_algorithms
    Type: signature_algorithms (0x000d)
    Length: 4
    Signature Hash Algorithms Length: 2
    Signature Hash Algorithms (1 algorithm)
        Signature Hash Algorithm: 0x0403
            Signature Hash Algorithm Hash: SHA256 (4)
            Signature Hash Algorithm Signature: ECDSA (3)
Extension: elliptic_curves
    Type: elliptic_curves (0x000a)
    Length: 4
    Elliptic Curves Length: 2
    Elliptic curves (1 curve)
        Elliptic curve: secp256r1 (0x0017)

Change 440520 had a related patch set uploaded (by Vgutierrez; owner: Vgutierrez):
[operations/puppet@production] kafka: Ensure JVM AES intrinsics usage if AES ciphersuites are enabled

https://gerrit.wikimedia.org/r/440520

Change 440520 merged by Ottomata:
[operations/puppet@production] kafka: Ensure JVM AES intrinsics usage if AES ciphersuites are enabled

https://gerrit.wikimedia.org/r/440520

Change 440544 had a related patch set uploaded (by Vgutierrez; owner: Vgutierrez):
[operations/puppet@production] varnishkafka: Set TLS signature algorithms and curves lists

https://gerrit.wikimedia.org/r/440544

Mentioned in SAL (#wikimedia-operations) [2018-06-26T09:09:22Z] <vgutierrez> upload librdkafka_0.11.3-1~bpo8+1+wikimedia2 to apt.w.o jessie-wikimedia - T182993

Mentioned in SAL (#wikimedia-operations) [2018-06-26T09:31:23Z] <vgutierrez> updating librdkafka1 and restarting varnishkafka-webrequest on cache::misc nodes - T182993

Mentioned in SAL (#wikimedia-operations) [2018-06-27T12:49:30Z] <vgutierrez> Upgrade librdkafka1 and restart varnishkafka-webrequest in cache::upload nodes - T182993

Mentioned in SAL (#wikimedia-operations) [2018-06-28T08:09:34Z] <vgutierrez> updating librdkafka1 && restart varnishkafka instances in cache::text nodes - T182993

Change 442794 had a related patch set uploaded (by Vgutierrez; owner: Vgutierrez):
[operations/puppet@production] varnishkafka: Set TLS curves list and sigalgs list for cache::misc

https://gerrit.wikimedia.org/r/442794

Change 440544 merged by Vgutierrez:
[operations/puppet@production] varnishkafka: Enable TLS signature algorithms and curves lists config

https://gerrit.wikimedia.org/r/440544

Change 442794 merged by Vgutierrez:
[operations/puppet@production] varnishkafka: Set TLS curves list and sigalgs list for cache::misc

https://gerrit.wikimedia.org/r/442794

Mentioned in SAL (#wikimedia-operations) [2018-06-28T09:10:56Z] <vgutierrez> Apply new TLS varnishkafka settings in cache::misc nodes - T182993

Change 442840 had a related patch set uploaded (by Vgutierrez; owner: Vgutierrez):
[operations/puppet@production] varnishkafka: Set TLS curves list and sigalgs list for cache::upload

https://gerrit.wikimedia.org/r/442840

Change 442840 merged by Vgutierrez:
[operations/puppet@production] varnishkafka: Set TLS curves list and sigalgs list for cache::upload

https://gerrit.wikimedia.org/r/442840

Mentioned in SAL (#wikimedia-operations) [2018-06-28T13:14:31Z] <vgutierrez> Apply new TLS varnishkafka settings in cache::upload nodes - T182993

Woo hoo!

Annnnd soon we disable IPSec?! :D

Woo hoo!

Annnnd soon we disable IPSec?! :D

As soon as we rollout this on cache::text nodes, I don't have any reason to stop the IPSec deprecation here :)

Change 443043 had a related patch set uploaded (by Vgutierrez; owner: Vgutierrez):
[operations/puppet@production] varnishkafka: Set TLS curve list and sigalgs list defaults

https://gerrit.wikimedia.org/r/443043

Change 443043 merged by Vgutierrez:
[operations/puppet@production] varnishkafka: Set TLS curve list and sigalgs list defaults

https://gerrit.wikimedia.org/r/443043

Mentioned in SAL (#wikimedia-operations) [2018-06-29T08:39:23Z] <vgutierrez> Apply new TLS varnishkafka settings in cache::text nodes - T182993

Change 447004 had a related patch set uploaded (by BBlack; owner: BBlack):
[operations/puppet@production] Remove ipsec from kafka jumbo nodes

https://gerrit.wikimedia.org/r/447004

Change 447004 merged by BBlack:
[operations/puppet@production] Remove ipsec from kafka jumbo nodes

https://gerrit.wikimedia.org/r/447004