Page MenuHomePhabricator

Optimise WMF WAN Network Configuration
Open, LowPublic

Description

Background

There are several outstanding questions relating to WAN configuration between WMF POPs that are ongoing, particularly:

T167841Cleanup confed BGP peerings and policies
T200277OSPF metrics

In the recent past we've made at least 2 changes to our protocol configuration to address particular niggles that cropped up, for instance:

T295672Use next-hop self on iBGP peerings to deal with dual Equinix IXP ports in eqiad
T283163Copied IGP cost to BGP MED to overcome fact that sub-AS path length is not considered in BGP selection

From discussing with @ayounsi we both agree that while changes like the above may be required for specific operational issues, we need to minimise any tweaking which is done with only consideration of one issue, or one site. It is risky to adjust global protocol parameters without thoroughly considering the wider effects. Additionally we don't want our protocol config/policy to be a complex beast created through a process of multiple minor tweaks over time.

Objective

Creating this task to track progress on a holistic review of our WAN configuration, with a goal of producing a new agreed design for the medium/long term.

To be clear the process may result in very few changes, our current setup might be deemed best. Certainly minimising change and disruption is good, and "change for change's sake" is not what this task is about. Nevertheless there may be improvements we could make.

Scope

To list off some random ideas to give a sense of what we could do:

  • Devise a new methodology for setting OSPF link costs, to aid expressing intent in terms of path selection.
  • Remove OSPF across the WAN, and replace Confed configuration with full eBGP between sites.
    • Thus making all inter-site routing decisions a matter of BGP policy.
  • Keep the OSPF, but still move to eBGP, with next-hops preserved.
    • Bringing AS-path length into consideration when selecting BGP paths.
  • Move all networks, apart from link and loopback addresses, from OSPF to BGP, and use loopback addresses only as next-hops.
    • This may warrant introducing BGP multipath to properly load balance towards two remote CRs announcing the same prefix.
  • Keep OSPF, but move the WAN transport to MPLS, likely SR-MPLS, to enable traffic-engineering of WAN paths according to policy.

There are many more things we might consider, the above are only off-the-cuff ideas to give a sense of the scope of the task. We need to arrive at a design that provides deterministic and predictable path selection, and gives us sufficient control. That said the paramount consideration should remain simplicity, we need to avoid adding too many layers of complexity to add "nerd knobs" to play with.

Event Timeline

cmooney triaged this task as Medium priority.Dec 9 2021, 9:29 AM
cmooney created this task.
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
cmooney updated the task description. (Show Details)

Given it came up as part of an incident report I'll explicitly mention we need to consider our "network only" POPs, like eqord, as part of this.

The key balance we need to strike is between forcing traffic to the network-only POP (BGP will tend to want to send traffic out via local transit/peering instead,) and pushing too much over the WAN link to it and having it max out.

akosiaris subscribed.

Removing SRE, has already been triaged to a more specific SRE subteam

@cmooney: Removing task assignee as this open task has been assigned for more than two years - see the email sent to all task assignees on 2024-04-15.
Please assign this task to yourself again if you still realistically [plan to] work on this task - it would be welcome! :)
If this task has been resolved in the meantime, or should not be worked on by anybody ("declined"), please update its task status via "Add Action… 🡒 Change Status".
Also see https://www.mediawiki.org/wiki/Bug_management/Assignee_cleanup for tips how to best manage your individual work in Phabricator. Thanks!

cmooney lowered the priority of this task from Medium to Low.Thu, May 16, 7:04 PM

Thanks. It is very much something we wish to do but unfortunately other priorities have always trumped it for multiple past quarters.