Page MenuHomePhabricator

IPVS issues with UDP services, pybal depooling strategy
Open, MediumPublic

Description

Last week's reboot of hydrogen, one of the two recdns in eqiad, caused a bunch of issues.

Currently, pybal depools servers by removing them from the virtual service (ipvsadm -d). IPVS has known packet loss issues when removing servers from UDP virtual services.

We should update pybal to do the following in case of planned maintenance:

  1. set weight to zero
  2. schedule server removal after a certain amount of time (if still under maintenance)

Similarly, in case of service failure:

  1. set weight to zero
  2. if failure persists, remove server

We should also consider enabling expire_nodest_conn. From ipvs-sysctl.txt:

expire_nodest_conn - BOOLEAN
        0 - disabled (default)
        not 0 - enabled

        The default value is 0, the load balancer will silently drop
        packets when its destination server is not available. It may
        be useful, when user-space monitoring program deletes the
        destination server (because of server overload or wrong
        detection) and add back the server later, and the connections
        to the server can continue.

        If this feature is enabled, the load balancer will expire the
        connection immediately when a packet arrives and its
        destination server is not available, then the client program
        will be notified that the connection is closed. This is
        equivalent to the feature some people requires to flush
        connections when its destination is not available.

Event Timeline

ema triaged this task as Medium priority.Jul 31 2017, 11:56 AM
ema added a project: PyBal.

+1. There are a number of tricky things here to get to these simple goals, though, and since the sysctls affect all services, we have to have the TCP cases in mind as well:

  1. We need to set the related ipvs conn_reuse sysctl to 2 before any of this. It's easy, it's an improvement today, and probably more of an improvement with everything else below.
  2. Pybal + ipvsadm need fixups and/or deployed version updates as appropriate:
    1. Current jessie ipvsadm doesn't support weight=0
    2. Current jessie ipvsadm doesn't support setting the sh scheduler flag we need to make weight=0 work sanely
    3. Current PyBal doesn't support weight=0
  3. Pybal needs to update its failure-monitoring depooling strategy before we turn on expire_nodest_conn - some of the monitors are too flappy, and flapping to a full backend-delete with expire_nodest_conn=1 has a lot more impact than without it. So before we turn on the sysctl, PyBal first has to get smarter about "weight=0 first, then remove later when failure persists".
  4. Our maintenance tooling needs to get smarter about weight=0 periods as well, but turning on expire_nodest_conn before these are all fixed is ok. Since maintenance doesn't really flap pointlessly, and almost always the service ends up shutting off at least briefly and losing all TCP connections anyways, either setting of the sysctl without a weight=0 period has about the same effect.

The swap of Traffic for Traffic-Icebox in this ticket's set of tags was based on a bulk action for all such tickets that haven't been updated in 6 months or more. This does not imply any human judgement about the validity or importance of the task, and is simply the first step in a larger task cleanup effort. Further manual triage and/or requests for updates will happen this month for all such tickets. For more detail, have a look at the extended explanation on the main page of Traffic-Icebox . Thank you!