Page MenuHomePhabricator

Kubernetes changeprop can't talk HTTPS to eventgate because it needs puppet CA cert
Closed, ResolvedPublic

Description

In Kubernetes, changeprop refuses to talk to eventgate over HTTPS because it cannot verify the certificate used by the endpoint.

The error is "unable to verify the first certificate". Eventgate's certificate is signed by the Puppet CA:

 Server certificate:
*  subject: CN=eventgate-main.discovery.wmnet
*  start date: Dec 17 15:54:27 2019 GMT
*  expire date: Dec 16 15:54:27 2024 GMT
*  subjectAltName: host "eventgate-main.discovery.wmnet" matched cert's "eventgate-main.discovery.wmnet"
*  issuer: CN=Puppet CA: palladium.eqiad.wmnet
*  SSL certificate verify ok.

We need to find a way to get changeprop to load this certificate. Installing the certificate in the chart is more or less trivial as other charts use the pattern of loading it via helmfile. There are three possible solutions to loading the certificate as far as I've ascertained:

  • Explicitly load the CA certificate in code
  • Get Node to load the system-wide CA certificate directory on-disk
  • Pass the certificate into node using the NODE_EXTRA_CA_CERTS environment variable (quickest but somewhat hacky)

Event Timeline

hnowlan triaged this task as Medium priority.Apr 7 2020, 4:39 PM

Change 587298 had a related patch set uploaded (by Hnowlan; owner: Hnowlan):
[operations/deployment-charts@master] Changeprop: add puppet CA cert to environment variables

https://gerrit.wikimedia.org/r/587298

Interesting! Since we use the Puppet CA as the root CA for internal service certs, including the Puppet CA Cert is probably something that should happen be include by default in all helm charts. @akosiaris ?

  • Explicitly load the CA certificate in code

Please no. We will end up having to release new versions of the code just for this.

  • Get Node to load the system-wide CA certificate directory on-disk

node on debian does this already (contrast this with the official release that ships with CAs, see https://github.com/nodejs/node/blob/master/src/node_root_certs.h if you want to cry a bit inside). But indeed our images up to now don't contain our Puppet_CA (cause he haven't had the need to do so up to now and cause it's not entirely clear how to do that yet).

  • Pass the certificate into node using the NODE_EXTRA_CA_CERTS environment variable (quickest but somewhat hacky)

Yup, this will work but it's hacky indeed. We can proceed with that to unblock it but if there is no rush, it's probably better to figure out how to do the above.

Change 587298 merged by jenkins-bot:
[operations/deployment-charts@master] Changeprop: add puppet CA cert to environment variables

https://gerrit.wikimedia.org/r/587298

  • Explicitly load the CA certificate in code

Please no. We will end up having to release new versions of the code just for this.

  • Get Node to load the system-wide CA certificate directory on-disk

node on debian does this already (contrast this with the official release that ships with CAs, see https://github.com/nodejs/node/blob/master/src/node_root_certs.h if you want to cry a bit inside). But indeed our images up to now don't contain our Puppet_CA (cause he haven't had the need to do so up to now and cause it's not entirely clear how to do that yet).

  • Pass the certificate into node using the NODE_EXTRA_CA_CERTS environment variable (quickest but somewhat hacky)

Yup, this will work but it's hacky indeed. We can proceed with that to unblock it but if there is no rush, it's probably better to figure out how to do the above.

I think the quick way (using NODE_EXTRA_CA_CERTS) is the right way to go, for a few reasons:

  • We will manage soon(TM) most service-to-service interactions with envoy, which will take care of terminating TLS on both sides, so whatever we do now is anyways a temporary fix
  • Adding ca-certificates to the base image isn't very harmful, but it's a few megabytes that are mostly unneeded nonetheless
  • Adding the puppet CA to our base images is even more of a breakage of the principle that base images should be pretty generic building blocks, not strictly tied to our current data.

We will manage soon(TM) most service-to-service interactions with envoy, which will take care of terminating TLS on both sides,

Does this include Kafka TLS? I include the Puppet CA cert in the eventgate-* and eventstreams helmfiles, which means I have to copy and paste it in 15 different places right now. In the longer term, would it be worthwhile to include the Puppet CA by default in pods in the default helm chart templates?

Change 587799 had a related patch set uploaded (by Hnowlan; owner: Hnowlan):
[operations/puppet@production] profile::kubernetes: add the puppet CA cert to general.yaml

https://gerrit.wikimedia.org/r/587799

Change 587799 merged by Hnowlan:
[operations/puppet@production] profile::kubernetes: add the puppet CA cert to general.yaml

https://gerrit.wikimedia.org/r/587799

Change 589570 had a related patch set uploaded (by Hnowlan; owner: Hnowlan):
[operations/deployment-charts@master] changeprop: Use global puppet CA cert

https://gerrit.wikimedia.org/r/589570

.Values.puppet_ca_cert should be available to all helmfile-using charts now.

Change 589605 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/deployment-charts@master] eventgate - No-op use .Values.puppet_ca_cert for kafka_ca_cert file

https://gerrit.wikimedia.org/r/589605

Change 589605 merged by Ottomata:
[operations/deployment-charts@master] eventgate - No-op use .Values.puppet_ca_cert for kafka_ca_cert file

https://gerrit.wikimedia.org/r/589605

Change 589621 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/deployment-charts@master] Use private/general.yaml in event* hemlfile.yaml files

https://gerrit.wikimedia.org/r/589621

Change 589621 merged by Ottomata:
[operations/deployment-charts@master] Use private/general.yaml in event* hemlfile.yaml files

https://gerrit.wikimedia.org/r/589621

Change 589635 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/deployment-charts@master] eventstreams - No-op. use global puppet ca cert

https://gerrit.wikimedia.org/r/589635

Change 589635 merged by Ottomata:
[operations/deployment-charts@master] eventstreams - No-op. use global puppet ca cert

https://gerrit.wikimedia.org/r/589635

Change 589570 merged by jenkins-bot:
[operations/deployment-charts@master] changeprop: Use global puppet CA cert

https://gerrit.wikimedia.org/r/589570

Change 592683 had a related patch set uploaded (by Hnowlan; owner: Hnowlan):
[operations/deployment-charts@master] changeprop: release new version

https://gerrit.wikimedia.org/r/592683

Change 592683 merged by jenkins-bot:
[operations/deployment-charts@master] changeprop: release new version

https://gerrit.wikimedia.org/r/592683

Change 592686 had a related patch set uploaded (by Hnowlan; owner: Hnowlan):
[operations/deployment-charts@master] changeprop: correct naming of puppet_ca_cert variable

https://gerrit.wikimedia.org/r/592686

Change 592686 merged by jenkins-bot:
[operations/deployment-charts@master] changeprop: correct naming of puppet_ca_cert variable

https://gerrit.wikimedia.org/r/592686