weighted maglev viability for low-traffic services
Open, MediumPublic
Actions

Assigned To

Authored By

	Vgutierrez
	Wed, Jun 26, 2:50 PM

Description

Katran only provides a weighted maglev scheduler, so we won't be able to replicate 1:1 the current low-traffic setup if we decide to migrate from IPVS to Katran.

As an alternative we could perform load balancing taking into account not only the source IP but also the source port to achieve a proper distribution of incoming request across the available realservers. As a side effect this means that a single client (in most cases CDN servers) will potentially hit several realservers instead of just one.

Is this something that we could assume? Do we have well-known blockers to follow this approach?

Related Objects
Search...

		Status	Subtype	Assigned	Task
		In Progress		Vgutierrez	T332027 Replace current L4LB with with Katran-based alternative
		Open		Vgutierrez	T368545 weighted maglev viability for low-traffic services

Event Timeline

Vgutierrez created this task.Wed, Jun 26, 2:50 PM

Restricted Application added a project: Infrastructure-Foundations. · View Herald TranscriptWed, Jun 26, 2:50 PM

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Vgutierrez triaged this task as Medium priority.Wed, Jun 26, 2:51 PM

Strictly on the network side, there is no blocker one way or the other.

I think I miss some context, what's the current low-traffic setup ? What's the downside of only using the source IP in the hashing? Not enough clients for proper balancing ?

In T368545#9929335, @ayounsi wrote:

I think I miss some context, what's the current low-traffic setup ?

Usually services use wrr (weighted round robin) to balance traffic across nodes

In T368545#9929335, @ayounsi wrote:

What's the downside of only using the source IP in the hashing? Not enough clients for proper balancing ?

Yep, that's what I suspect, that using source IP only would lead to some realservers getting significantly more traffic than others given the relatively small pool of source IPs. And it looks like that behavior from maglev is already being (ab)used by thanos-web low-traffic service:

# Needed for SSO sessions to stick. As of Dec 2022 backend
# selection from varnish -> ATS is always random for pass traffic
# therefore even with this we need at most one thanos-fe host
# pooled at a time (per site).
scheduler: mh

It is pretty clear to me that the only way to have fair load balancing with maglev is if we do the consistent hashing using the remote port as well.

We're in the situation where we have less client IPs than backends for virutally every service, so source IP hashing would never be enough to get a fair distribution.

jhathaway subscribed.Thu, Jun 27, 2:05 PM

weighted maglev viability for low-traffic servicesOpen, MediumPublicActions

Description

Related ObjectsSearch...

Event Timeline

weighted maglev viability for low-traffic services
Open, MediumPublic
Actions

Related Objects
Search...