Page MenuHomePhabricator

OSPF metrics
Open, LowPublic

Description

While turning up the new GTT links, I ended up in a catch 22 situation for the eqdfw-knams OSPF metric.

The eqdfw-knams needs have a lower metric than the current primary (codfw-eqiad + eqiad-esams) links so traffic from codfw to esams prefer that link.
At the same time, have a higher metric than the current primary (eqiad-esams - codfw-eqiad) links so traffic from eqiad still prefers the direct link.
So:
840-320 < x < 320+840
520 < x < 1160

But if the primary eqiad-esams link goes down, it's backup has a metric of 1820 (vs. 840), this is due I think to the current policy to add a weight of 1000 to the backup link.

This means, to respect the same conditions stated previously, the metric now have to be:
1820-320 < x < 1820+320
1500 < x < 2140

The low hanging fruit if we only focus on that part of OSPF metrics is to lower the backup eqiad-esams link's metric.
For example I used:
Primary eqiad-esams: 880
Backup eqiad-esams: 968

880-320 < x < 320+880
560 < x < 1200

With primary eqiad-esams down:
968-320 < x < 968+320
648 < x < 1288

So a metric of 1089 for eqdfw-knams works.

Looking into that issue, I drew a map of all our links with the current metrics.
I noticed that in a few places the current metrics don't match the latency anymore, one link is asymmetrical, prefix the backups links with a "1" doesn't work everywhere, and special cases are not explained.

State of OSPF @ Wikimedia.png (751×1 px, 114 KB)

On that diagram, the 4th number for each link is the suggested new metric, based on the new tested latency and the following rules:

  • Main link: latency*10
  • Backup link, latency*10+10%
  • Tunnel: latency*10+20%
  • Preferred link: latency*10-10%
  • Special link: explain value in comment

Thoughts?

Event Timeline

ayounsi triaged this task as Medium priority.Jul 24 2018, 4:02 PM
ayounsi created this task.
Restricted Application added a subscriber: Aklapper. · View Herald Transcript

The eqdfw-knams needs have a lower metric than the current primary (codfw-eqiad + eqiad-esams) links so traffic from codfw to esams prefer that link.

Could you explain that premise? What are we trying to optimize for?

If a path with an extra hop in eqiad is the lowest latency path, that could just become our preferred path, despite not being direct? Also since both of those paths are waves instead of EVPN, i.e. not MPLS based. But perhaps you're trying to share the load over as many links as possible in the normal case?

Could you explain that premise? What are we trying to optimize for?

If a path with an extra hop in eqiad is the lowest latency path, that could just become our preferred path, despite not being direct? Also since both of those paths are waves instead of EVPN, i.e. not MPLS based. But perhaps you're trying to share the load over as many links as possible in the normal case?

Indeed, codfw-eqiad-esams is sensibly similar in term of latency than eqdfw-knams. In normal operation it would allows to share the load.
Only eqiad traffic (most of the esams traffic) uses the direct eqiad-esams link. "secondary" traffic to esams (from codfw, ulsfo, eqsin), would use the eqdfw-knams link.

To a lesser extent (as failover times are very short anyway), it takes eqiad out of the picture for codfw-esams traffic, in case of eqiad failures.

In addition, as the link is new, pushing little (less critical) traffic on it would help test the link.

ayounsi lowered the priority of this task from Medium to Low.Jan 17 2019, 10:03 PM

Low priority, over to @faidon for feedbacks.

The idea here is that once we have a sound logic behind the OSPF metrics, we can:
1/ Create the following Netbox circuits custom fields:

  • latency: in ms, this will not change often
  • status: multiple choice between primary, backup, preferred, drained (exact fields TBD)

2/ Remove the physical links from config/common.yaml
(Unfortunately Netbox doesn't support virtual links)

This will have the following benefits:

  • Hide the complexity of links metrics, so any SRE can safely drain a transport link by flipping a Netbox bit and running Homer
    • Opens the way to automate preemptively draining links before providers maintenance, where a 3rd party tool would do this bit flip
  • The status field will later down the road be used for transit links draining (using BGP_GRACEFUL_SHUTDOWN or tearing down BGP)
    • Opens the way to fully drain a router, eg. before upgrade, after failure, etc...
    • And similarly, to preemptively disable transits before maintenance

Interesting idea! Couple of notes:

  • What do you mean by "virtual links" and Netbox not supporting them? Is that VLANs for our transports over the PtMP VPLS?
  • What do you envision the difference to be between "primary" and "preferred"? (I know you said TBD, but curious :)
  • It'd be interesting to see how this would look like before we start adding the fields. That may help us figure out what the right values for those fields may be. Would it make sense to list our links in a Phaste or spreadsheet or something and figure out if the output makes sense?

Interesting idea! Couple of notes:

  • What do you mean by "virtual links" and Netbox not supporting them? Is that VLANs for our transports over the PtMP VPLS?

Yes, both PtMP VPLS (displayed as 3 links from site X to provider, and not site X to site Y) and GRE tunnels between sites.

  • What do you envision the difference to be between "primary" and "preferred"? (I know you said TBD, but curious :)

TBD, but this is to reflect our current logic exposed in the diagram.
Primary would be the default state. Preferred an override to drain alternate links.

  • It'd be interesting to see how this would look like before we start adding the fields. That may help us figure out what the right values for those fields may be. Would it make sense to list our links in a Phaste or spreadsheet or something and figure out if the output makes sense?

I think a diagram makes more sens to see the links in relation to each others. See task description :)

New proposal! Change to T200277#6077728 is to use the following fields:

  • metric - keeps things more generic than latency
  • state - choice between
    • default - use the set metric as it
    • preferred - half the set metric
    • drained - sets an OSPF metric of 5000

Implemented in the below CR and tested on netbox-next.

This will then need a report to alert on any transport that doesn't have a default state.

The initial scope of the task, standardizing metrics, still needs to be done, but is not a blocker to having it Netbox driven.

Change 617603 had a related patch set uploaded (by Ayounsi; owner: Ayounsi):
[operations/homer/public@master] Configure transport links OSPF based on Netbox data

https://gerrit.wikimedia.org/r/617603

Discussed it with Faidon and created/populated the custom fields in Netbox.

Change 617603 merged by jenkins-bot:
[operations/homer/public@master] Configure transport links OSPF based on Netbox data

https://gerrit.wikimedia.org/r/617603