Page MenuHomePhabricator

Pybal should reject a confctl configuration that indicates only one cp-text is pooled
Closed, DuplicatePublic

Event Timeline

jbond triaged this task as Medium priority.Feb 13 2020, 11:46 AM

Personally I don't think Pybal should be rejecting that; it's a valid configuration from a technical standpoint, and there can be valid reasons to have it, at least temporarily. But we may decide that in our specific environment that should be avoided at all cost, so perhaps that logic should be implemented elsewhere - in the code that manages pooling state.

Personally I don't think Pybal should be rejecting that; it's a valid configuration from a technical standpoint, and there can be valid reasons to have it, at least temporarily. But we may decide that in our specific environment that should be avoided at all cost, so perhaps that logic should be implemented elsewhere - in the code that manages pooling state.

This is confusing to me, as I've heard things from @Joe and a couple others indicating that they thought Pybal already worked the way that the task title describes.

Is the min-pooled threshold 1? So that it would reject a configuration with nothing at all pooled, but 1 node pooled is OK?

akosiaris subscribed.

Removing SRE, traging to Traffic-Icebox since this is pybal and is pretty old.

I'm not sure I understand the title: Should depooling be refused when only one host would be remaining or when a sole remaining host would be depooled, leaving it at zero?

It appears that the depool-threshold configuration in pybal.conf is set to 0.5, which, according to this 2015 incident report, means:

[…] after the first half have been depooled, it will refuse to depool any further backends regardless of healthcheck state. This condition of pybal having half of its backends for a service depooled, and all failing healthchecks, [doesn't] trigger any icinga alerts. Service [continues] to be healthy from an external perspective under these conditions.

From the canDepool() function in coordinator.py:

# Total number of servers
totalServerCount = len(self.servers)

# Number of hosts considered to be up by PyBal's monitoring and
# administratively enabled. Under normal circumstances, they would be
# the hosts serving traffic.
# However, a host can go down after PyBal has reached the depool
# threshold for the service the host belongs to. In that case, the
# misbehaving server is kept pooled. This count does not include such
# hosts.
upServerCount = sum(
    1
    for server
    in self.servers.itervalues()
    if server.up and server.enabled)

# The total amount of hosts serving traffic may never drop below a
# configured threshold
return upServerCount >= totalServerCount * self.lvsservice.getDepoolThreshold()

If I understand this correctly, e.g. 1 >= 1 * 0.5 would return True and de-pool the remaining host.