Page MenuHomePhabricator

Deploy pybal with BGP MED support (for primary/backup) in production
Closed, ResolvedPublic

Description

Pybal was extended with support for the BGP MED (Multi-Exit Discriminator, aka metric) attribute in a6ae55449d6986d469e808d8b87d21158f59ccda. I've tested it against a Quagga instance on pybal-test2002, using a pybal instance on pybal-test2003.

It would be good to get this deployed in production, replacing the current routing policies on the core routers and allowing pybal to indicate primary/backup status itself. This could then even be driven from e.g. etcd.

Event Timeline

mark created this task.May 17 2017, 10:54 AM
Restricted Application added a project: Operations. · View Herald TranscriptMay 17 2017, 10:54 AM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
mark moved this task from Triage to LoadBalancer on the Traffic board.May 17 2017, 4:09 PM
ayounsi moved this task from Backlog to Configuration on the netops board.Jun 27 2017, 2:52 PM
ayounsi moved this task from Configuration to Watching on the netops board.Jul 12 2017, 7:23 PM
ema triaged this task as Medium priority.Jul 18 2017, 1:26 PM
mark moved this task from Backlog to Blocked on the Pybal board.Aug 9 2017, 9:01 PM

Change 378920 had a related patch set uploaded (by Ema; owner: Ema):
[operations/debs/pybal@1.14] 1.14.0: prometheus metrics, BGP MED, bugfixes

https://gerrit.wikimedia.org/r/378920

Change 378920 merged by Ema:
[operations/debs/pybal@1.14] 1.14.0: prometheus metrics, BGP MED, bugfixes

https://gerrit.wikimedia.org/r/378920

Change 380459 had a related patch set uploaded (by Ema; owner: Ema):
[operations/debs/pybal@master] 1.14.0: prometheus metrics, BGP MED, bugfixes

https://gerrit.wikimedia.org/r/380459

Change 380459 merged by Ema:
[operations/debs/pybal@master] 1.14.0: prometheus metrics, BGP MED, bugfixes

https://gerrit.wikimedia.org/r/380459

Change 380516 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] pybal: BGP MED configuration

https://gerrit.wikimedia.org/r/380516

Mentioned in SAL (#wikimedia-operations) [2017-10-09T09:51:36Z] <ema> test setting bgp med on lvs3001/3003 T165584

Change 380516 merged by Ema:
[operations/puppet@production] pybal: BGP MED configuration

https://gerrit.wikimedia.org/r/380516

Mentioned in SAL (#wikimedia-operations) [2017-10-09T12:26:39Z] <ema> restart pybal on esams load balancers to pick up bgp-med config change T165584

Mentioned in SAL (#wikimedia-operations) [2017-10-09T12:31:09Z] <ema> restart pybal on ulsfo load balancers to pick up bgp-med config change T165584

Mentioned in SAL (#wikimedia-operations) [2017-10-09T12:38:03Z] <ema> restart pybal on codfw load balancers to pick up bgp-med config change T165584

Mentioned in SAL (#wikimedia-operations) [2017-10-09T12:47:27Z] <ema> restart pybal on eqiad load balancers to pick up bgp-med config change T165584

ema added a subscriber: ema.Oct 9 2017, 12:53 PM

All load balancers are now using BGP MED. Primaries send the MED attribute with a value of 0, backups send 100.

What's left to be done here is changing routing policies on the routers AFAIU. @ayounsi

We need to cleanup this specific term, now that the LVS advertise the MED themselves.

delete policy-options policy-statement LVS_import term secondary
ayounsi closed this task as Resolved.Oct 20 2017, 4:55 PM
ayounsi assigned this task to ema.

Done!