Page MenuHomePhabricator

Consider balancing VRRP primaries to cr1/cr2
Closed, ResolvedPublic

Description

(Thinking mainly of eqiad/codfw here)

The way our VRRP setup currently works is that one of the two core routers (say, cr1) is the VRRP primary (previously known as master) for all VLANs, and the other one (say, cr2) is the secondary. The two routers iBGP & OSPF with each other and other routers in the network. Depending on eBGP AS path selection and OSPF weights, traffic may go cr1->cr2->{external to DC} or cr1->{external to DC}.

We could consider adjust the VRRP priorities to split VLANs between the two routers -- say, cr1 is primary for rows A/B and cr2 is primary for rows C/D.

The upside is that this would provide some natural loadbalancing between multiple paths both internally (e.g. codfw->eqdfw, eqiad->codfw), and externally in the cases of equal cost AS path lengths e.g. 1299 174 on one router, 2914 174 on the other. In turn, this means that we'd always exercise paths (such as e.g. a wavelength, but also cr->switch) that today remain idle until a failure happens.

The downside (that I can see so far!) is that it may make network issues more spotty and harder to pinpoint, as the return path may follow different paths depending on the server one accesses. Traceroute to the Gerrit box may end up being different than the one to the bastion host, as an example. By extension, for loadbalanced services, this will depend on which realserver the loadbalancer picked, which in turn would be a property of source IP (due to source IP hashing), so two different users even in the same ISP or /24 may end up having a different return path.

Event Timeline

faidon triaged this task as Medium priority.Sep 17 2020, 11:15 PM
faidon created this task.

BTW, one dangerous impact of this (as with all ECMP!) is that it would harder to notice a situation where we don't have enough capacity to carry regular amounts of traffic when one of the paths is down for whatever reason. We could perhaps mitigate this by tuning our monitoring to alert on 40-50% utilization, at least for the common cases of link redundancy (codfw/eqdfw, eqiad/codfw). So this will still get us extra capacity for "abnormal" conditions (like edge in eqiad but MW & Swift on codfw etc.) but still alert us to the situation where we don't have enough capacity for normal levels of traffic.

Change 629364 had a related patch set uploaded (by Ayounsi; owner: Ayounsi):
[operations/homer/public@master] Add vrrp_master_pinning in eqiad

https://gerrit.wikimedia.org/r/629364

Change 629364 merged by jenkins-bot:
[operations/homer/public@master] Add vrrp_master_pinning in eqiad

https://gerrit.wikimedia.org/r/629364

Mentioned in SAL (#wikimedia-operations) [2020-09-24T07:57:31Z] <XioNoX> configure vrrp_master_pinning in eqiad - T263212

Change 629615 had a related patch set uploaded (by Ayounsi; owner: Ayounsi):
[operations/homer/public@master] Configure vrrp_master_pinning in codfw

https://gerrit.wikimedia.org/r/629615

Change 629615 merged by jenkins-bot:
[operations/homer/public@master] Configure vrrp_master_pinning in codfw

https://gerrit.wikimedia.org/r/629615

Mentioned in SAL (#wikimedia-operations) [2020-09-24T08:15:01Z] <XioNoX> configure vrrp_master_pinning in codfw - T263212

This is now pushed to eqiad and codfw. Result can be seen on:
https://librenms.wikimedia.org/graphs/id=16333/type=port_bits/
and
https://librenms.wikimedia.org/graphs/id=16552/type=port_bits/

The alerting part is a bit more tricky.

Ideally we would take the links state into consideration: If the twin link is down alert at 80%, if it's up alert when the sum is at 80% of the ifSpeed of a single link. Which might be doable with LibreNMS custom SQL alerts.

Ideally we would take the links state into consideration: If the twin link is down alert at 80%, if it's up alert when the sum is at 80% of the ifSpeed of a single link. Which might be doable with LibreNMS custom SQL alerts.

Another way to say this, to cover all these cases (including future scenarios where there could be more than two links, or they could differ in speed during transitions to higher speed links) would be: "Alert if the sum of traffic in the link group is >=80% of the ifSpeed of the slowest non-down link"

This is now pushed to eqiad and codfw. Result can be seen on:
https://librenms.wikimedia.org/graphs/id=16333/type=port_bits/
and
https://librenms.wikimedia.org/graphs/id=16552/type=port_bits/

The alerting part is a bit more tricky.

Ideally we would take the links state into consideration: If the twin link is down alert at 80%, if it's up alert when the sum is at 80% of the ifSpeed of a single link. Which might be doable with LibreNMS custom SQL alerts.

How much more complicated is it to make this more generic and handle any set of N aggregated links, not just pairs? (I'm thinking of AMSIX)

Monitoring discussion moved to T264300.
Balancing is done.