To deploy a new service in production on kubernetes right now there is a set of thing that need to be done. Marked as [SRE] or [service owner] in the list below
Group A: deployment of the service on kubernetes
- set up the appropriate set of values in helmfiles in deployment-charts [service-owner]
- set up the user token/credentials and other private data in the private puppet repository [SRE]
- set up the corresponding namespace on all clusters [SRE]
- deploy the service to all clusters [service-owner]
%Group B is all SRE actions.
Group B: Setting up LVS (load balancer)
- Add the new service to every kube worker in conftool-data, discovery
- Add the service IP to all the kube workers on loopback
- Add the service to DNS records (both normal and discovery)
- Fill in the service metadata to change the LVS configuration
- Restart all the relevant LVS
- Switch the monitoring of these endpoints to critical: true in puppet to add paging
While optimizing / eliminating any of these steps is nice, the biggest time-sink is without doubt the setup of LVS. It's long, complicated, failure prone, and only a handful of SREs are confident around the process.
How can we make that process better? In the reminder of this task I'll describe a few possible approaches.
Set up an ingress
This means allowing service owners to setup kubernetes resources that are specifically tailored to load-balancing and routing traffic incoming from the exterior of the cluster. It provides a pretty simple interface to manage externally. Unlike other proposals below, this is a L7 loadbalancer meaning it understands TLS, virtualhosts, HTTP etc.
Depending on where the implementation software is installed we have the following to paths. Both are valid approaches.
Inside kubernetes cluster
It means a request coming from the public would go as follows:
client => LB (pybal) => kube worker => kube-proxy (not a real hop, but does DNAT) => ingress => pods
Outside the kubernetes cluster
client => LB (pybal) => ingress => pods
This would add one node to the chain of proxying, and more moving parts. We would need to investigate Ingress solutions once more as we 've done in T170121, which was ~2,5 years ago. Things have changed since then.
How would Group B actions look like
- Add a cname to the ingress.
- Add some monitoring/alerting
That's it. A simple puppet patch and a simple dns patch. For very large services, maybe add a per-namespace setup.
Pros/cons
pros:
- Integrated with kubernetes
- industry "standard"
- the rest of the infrastructure is left unmodified
- L7 functionality.
cons
- more moving parts
- Ingress quality/stability at scale needs to be evaluated.
- yet more complexity in our charts
- Adding a potential SPOF (that is in some aspects addressable) as well as a potential chokepoint.
- Essentially HTTP only. Specific implementations may support more protocols but overall the Ingress resources wasn't designed for this.
Modify pybal to autoconfigure pools from k8s
We already have a pybal patch ensuring we can fetch which workers are active from k8s instead of etcd, but we could expand it further to read /all/ data pybal needs from k8s, including pods. We would still need a consistent way to add IPs to the Load balancers and the k8s nodes, but that can be mostly done with some additional improvements [citation needed].
The flow of requests would be client => LB (pybal) => pod (via kube-proxy)
How would Group B actions look like
- Add the dns record for the new service
- Add the realserver IP on the kubernetes workers and the load balancers
- Let pybal add the configuration when the service is properly annotated.
- Add monitoring/alerting.
Three relatively simple patches (one to DNS, two to puppet). Some coordination is needed.
Pros/cons
pros:
- no change to our current setup
- Known unknowns. Pybal is mostly 'boring'
- LVS-DR
cons:
- Still an invented here solution
- Not fully automated service addition, will still need to add IPs to the backends somehow.
- Will need significant dev effort
- Lack of L7 support
kube-proxy + bird
In this hypothesis, we'd have kube-proxy doing all the load-balancing, and announcing the LVS IPs via bird directly.
In this case, we'd have the simplest request flow:
client => kube-proxy => pod
In this hypothesis, we should configure some bgp daemon depending on which IPs we have configured on k8s, and run it as a sidekick of kube-proxy. One of the complications of this is that calico relies on running bird on each kubernetes nodes, so we 'd either have to setup kube-proxy+bird outside the cluster (mostly ending up resembling LVS), or we would need to figure out how to augment calico's bird configuration if we want to host it on the workers. This is essentially a variant of the pybal approach above.
How would Group B actions look like
- Add the dns record for the new service
- Add the realserver IP on the kubernetes workers (this can probably be automated by using annotations in the k8s api, but is it worth it?)
- Add monitoring/alerting.
Three relatively simple patches (one to DNS, two to puppet).
Pros/cons
pros:
- least hops for a request
- No additional moving parts besides bird
- Overall the simplest configuration
cons:
- Unknown cost of working on a solid bgp announcement system.
- Might need additional configuration to know which IPs to serve
- Lack of L7 support
- No LVS-DR
Refactor all the setup of LVS across dns and puppet
It's probably possible to simplify the steps to set up a load-balanced service by rationalizing the puppet code around it (for example, synchronizing systems across various stages, or allowing to add a new service all in a patch and not in 3 different ones).
pros
- no new technology would be introduced in production
cons
- No clear implementation idea
- Might never achieve a fully streamlined solution