What?
When adding new nodes/control-planes to k8s clusters we need to run puppet on all nodes of the cluster in order to create ferm rules that allow the new node to connect to typha which runs on some of the nodes in the nodes network namespace.
This is not ideal as it might lead to a situation where a new node is added to the cluster_nodes: list in hiera before DNS resolution works for it. Puppet then adds the node to the ferm rule (/etc/ferm/conf.d/10_calico-typha) but as resolving the A record fails, no iptables rule is created.
When DNS resolution starts working, ferm is not refreshed (by puppet) because the node is already in the ferm rule (so no file change)
Temporary Bandaid
-
Move from the the legacy resolution (with the resolve() function of ferm does the DNS lookup) to the new srange() parameter (where DNS is resolved on the Puppet server side with every Puppet run).Done
Proposal
- Relax the typha ferm rule in such a way that we don't need a per host access rule
- This would let us depricate the cluster_nodes: config structure in hiera completely, as it is used nowhere else.
How?
Enable (if possible) authentication between calico-node and typha. Calico uses mTLS by default between typha and felix (calico-node) when deployed using the tigera-operator
Option 1: certmanagert
That is not the case in our setup we'd have to provide the necessary certificates ourselves. We probably can't do that inside kubernetes (with certmanager) as that would require Pod networking to be up, which is not the case when initially bootstrapping a cluster.
Option 2: Generate the certificates via puppet
We could generate the certificates for typha and felix (calico-node) via puppet on all kubernetes nodes and mount them into the pods by a hostPath volume.
Question: Is typha and felix capable of hot reloading certificates if they change on disk? We assume they can, as they use k8s certificate/secret objects when deployed via the operator
Now What?
We need 2 certificates that need to be available to Felix and Typha
- Typha: Common Name: typha-client
- extended key usage ServerAuth
- Felix: Common Name: typha-server
- extended key usage ClientAuth
Those certs should be available to felix and typha respectively.
Providing certs to pods
- Secrets
- The certificates could be secrets which we can then mount as files
- Files on workers
- We could have those certs present on all workers and mount them via host path
Open questions
- certs expiration & renewal
- How often should those certs expire
- Can typha and/or calico detect cert changes?
- If a cert is renewed, will it be immediately available?
Docs
https://docs.tigera.io/calico/3.26/reference/typha/configuration#felix-typha-tls-configuration
https://docs.tigera.io/calico/3.26/network-policy/comms/crypto-auth