Page MenuHomePhabricator

Deploy pybal with BGP MED support (for primary/backup) in production
Closed, ResolvedPublic

Description

Pybal was extended with support for the BGP MED (Multi-Exit Discriminator, aka metric) attribute in a6ae55449d6986d469e808d8b87d21158f59ccda. I've tested it against a Quagga instance on pybal-test2002, using a pybal instance on pybal-test2003.

It would be good to get this deployed in production, replacing the current routing policies on the core routers and allowing pybal to indicate primary/backup status itself. This could then even be driven from e.g. etcd.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
ema triaged this task as Medium priority.Jul 18 2017, 1:26 PM

Change 378920 had a related patch set uploaded (by Ema; owner: Ema):
[operations/debs/pybal@1.14] 1.14.0: prometheus metrics, BGP MED, bugfixes

https://gerrit.wikimedia.org/r/378920

Change 378920 merged by Ema:
[operations/debs/pybal@1.14] 1.14.0: prometheus metrics, BGP MED, bugfixes

https://gerrit.wikimedia.org/r/378920

Change 380459 had a related patch set uploaded (by Ema; owner: Ema):
[operations/debs/pybal@master] 1.14.0: prometheus metrics, BGP MED, bugfixes

https://gerrit.wikimedia.org/r/380459

Change 380459 merged by Ema:
[operations/debs/pybal@master] 1.14.0: prometheus metrics, BGP MED, bugfixes

https://gerrit.wikimedia.org/r/380459

Change 380516 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] pybal: BGP MED configuration

https://gerrit.wikimedia.org/r/380516

Mentioned in SAL (#wikimedia-operations) [2017-10-09T09:51:36Z] <ema> test setting bgp med on lvs3001/3003 T165584

Change 380516 merged by Ema:
[operations/puppet@production] pybal: BGP MED configuration

https://gerrit.wikimedia.org/r/380516

Mentioned in SAL (#wikimedia-operations) [2017-10-09T12:26:39Z] <ema> restart pybal on esams load balancers to pick up bgp-med config change T165584

Mentioned in SAL (#wikimedia-operations) [2017-10-09T12:31:09Z] <ema> restart pybal on ulsfo load balancers to pick up bgp-med config change T165584

Mentioned in SAL (#wikimedia-operations) [2017-10-09T12:38:03Z] <ema> restart pybal on codfw load balancers to pick up bgp-med config change T165584

Mentioned in SAL (#wikimedia-operations) [2017-10-09T12:47:27Z] <ema> restart pybal on eqiad load balancers to pick up bgp-med config change T165584

All load balancers are now using BGP MED. Primaries send the MED attribute with a value of 0, backups send 100.

What's left to be done here is changing routing policies on the routers AFAIU. @ayounsi

We need to cleanup this specific term, now that the LVS advertise the MED themselves.

delete policy-options policy-statement LVS_import term secondary
ayounsi assigned this task to ema.

Done!