Page MenuHomePhabricator

Puppetize/stand up a load balancer for K8s API servers
Closed, ResolvedPublic


To do a simple HA setup for Toolforge k8s, we'll need a trio of api servers behind a load balancer. This load balancer is probably best using haproxy rather than nginx (probably?), but we need to check if it needs to be part of tls termination or what have you (hope not, but maybe). This needs puppet stuff and experiments -- and this is what would get the DNS name and the static IP since it should be a fully rebuild-able item vs the api servers which might be harder to fix so easily.

There are other ways to do this (zookeeper and such), but this is the documented and straightforward method. We can also potentially have a passive/active pair of haproxy servers to allow us to move the proxy around if needed fairly simply.


Event Timeline

Bstorm triaged this task as High priority.Feb 7 2019, 4:30 PM
Bstorm created this task.

@GTirloni just noted that we can roundrobin DNS two lbs! That would make this actually HA.

Change 490201 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/puppet@production] toolforge-k8s: set up an haproxy load balancer for HA api servers

Change 490201 merged by Bstorm:
[operations/puppet@production] toolforge-k8s: set up an haproxy load balancer for HA api servers

It looks like production is using LVS for this. Despite the starter work on using haproxy, this really should use that for consistency and familiarity within the foundation. kube-api is just an api, so there should be no real barrier to applying that.

This might be done, but we won't know until we have a new cluster to test with that...does a bit more.

aborrero assigned this task to Bstorm.
aborrero added a subscriber: aborrero.

We have this done, thanks to @Bstorm :-) We have been testing this setup already in T215531: Deploy upgraded Kubernetes to toolsbeta.

Probably, next step is to figure out if we need HA for this server as well, i.e, redundancy at the proxy level, so we don't depend on just one being active.

Closing task now. Feel free to reopen if required.

Just noticed a thing about this setup. We don't preserve the source address of the original client, which may complicate things in case we need debugging.
I wonder if we should consider other proxy approach, like using a L3/L4 load balancer instead (NAT based).

We don't do any L7 stuff in this proxy anyway, SSL termination or the like.