Page MenuHomePhabricator

PyBal BGP group prefix-limit 50 teardown
Closed, ResolvedPublic

Description

group PyBal has this configured:

family inet {
    unicast {
        prefix-limit {
            maximum 50;
            teardown;
        }
    }
}
family inet6 {
    unicast {
        prefix-limit {
            maximum 50;
            teardown;
        }
    }
}

Which mean if any of the LVS advertise more than 50 prefixes (VIPs), then the router will shutdown the BGP session.
Probably got setup long ago and forgot about it.
This is usually useful for peers we don't trust, so if they screw up their BGP config they don't send us faulty prefixes.

lvs1016 is now at 48 prefixes, codfw top peer at 42.

I see 3 options:

  • The easiest is to keep the status quo but bump it to 100 with and add a warning log at 80%.
set protocols bgp group PyBal family inet unicast prefix-limit maximum 100 teardown 80
set protocols bgp group PyBal family inet6 unicast prefix-limit maximum 100 teardown 80

Not sure we will catch it before we have log alerting but it's better than nothing.

  • As LVS are trusted peers, we can also remove the teardown, so we start getting logs if we have more than 100 prefixes, but don't take the session down.

But without log alerting we might never know if there is an issue

  • Last option is to bump the current value to something very large like 1000 with a teardown 80, so we don't risk forgetting it and hitting the same problem in the future, but we still have some safeguards if pybal starts miss-behaving.

The 3rd option is my preferred one.

Event Timeline

ayounsi triaged this task as High priority.Feb 25 2020, 2:37 PM
ayounsi created this task.

Option number three looks good, but IMHO I'd decrease the teardown percentage.

For what is worth, the Kubernetes groups (v4, v6) have copy pasted the PyBal group, so it probably makes sense to follow the same approach for those as well. There is one extra interesting thing here, each node is going to be advertising at least a /26 [1]. Currently nodes count is 6 so it's 6 prefixes that are being advertised, but we expect that to grow to 14-15 real soon (next couple of weeks?) and overall the goal is probably ~250-300 nodes, if not more.

[1] I say at least, because in the a bad scenario we experienced, due to operator error (yours truly), summarization failed and we ended up advertising something like 40 /32 prefixes.

[1] I say at least, because in the a bad scenario we experienced, due to operator error (yours truly), summarization failed and we ended up advertising something like 40 /32 prefixes.

If this happen, would you be fine for the session to be shutdown? Knowing that the routers would handle them fine, as long as they are not bogus (as in replacing legit advertisements and thus causing routing issues).

[1] I say at least, because in the a bad scenario we experienced, due to operator error (yours truly), summarization failed and we ended up advertising something like 40 /32 prefixes.

If this happen, would you be fine for the session to be shutdown? Knowing that the routers would handle them fine, as long as they are not bogus (as in replacing legit advertisements and thus causing routing issues).

Would the advertised prefixes in this case be withdrawn and thus make the pods unreachable from the rest of the infrastructure? If yes, that would cause a major outage.

Ok, looks like a good one would be:

set protocols bgp group PyBal family inet unicast prefix-limit maximum 1000 teardown 20
set protocols bgp group PyBal family inet6 unicast prefix-limit maximum 1000 teardown 20
set protocols bgp group Kubernetes4 family inet unicast prefix-limit maximum 2000 teardown 80
set protocols bgp group Kubernetes6 family inet6 unicast prefix-limit maximum 2000 teardown 80
set protocols bgp group Fundraising family inet unicast prefix-limit teardown 80
set protocols bgp group Fundraising family inet6 unicast prefix-limit teardown 80
set protocols bgp group Anycast4 family inet unicast prefix-limit teardown 80

Which adds logging for all, sets the PyBal alerting to a lower value (200), bump Kubernetes to something that would not be impacted by a miss-configuration preventing summarization.

I'll push it tomorrow if no objections.

+1 to bumping the limit, although the snipped above has 20 not 200 as the limit for pybal if I'm reading correctly

The syntax is not obvious, maximum 1000 teardown 20 means shutdown the session at 1000 but start sending warning logs at 20% of the 1000.

Mentioned in SAL (#wikimedia-operations) [2020-02-27T12:41:11Z] <XioNoX> bump BGP prefix-limit on all routers - T246110