Page MenuHomePhabricator

ospf link-protection
Closed, ResolvedPublic

Description

Follow up from a conversation with @faidon.

As routers already have "load-balance per-packet" configured. The only needed step is to add the statement "link-protection" under 'protocols ospf[3] <interface>" for each cross DC links (most likely to get cut).

Then confirm the correct backup route is properly installed by looking at "show ospf backup spf" as well as "show route forwarding-table destination xxx".

We could also lower the BFD timers (currently waits for 3*300ms to consider a link down) to speed up failover on the MX routers. Depending on how fast we want failover to happen.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

This task has been assigned to the same task owner for more than two years. Resetting task assignee due to inactivity, to decrease task cookie-licking and to get a slightly more realistic overview of plans. Please feel free to assign this task to yourself again if you still realistically work or plan to work on this task - it would be welcome!

For tips how to manage individual work in Phabricator (noisy notifications, lists of task, etc.), see https://phabricator.wikimedia.org/T228575#6237124 for available options.
(For the records, two emails were sent to assignee addresses before resetting assignees. See T228575 for more info and for potential feedback. Thanks!)

Change 698512 had a related patch set uploaded (by Ayounsi; author: Ayounsi):

[operations/homer/public@master] Add OSPF link-protection to all P2P links

https://gerrit.wikimedia.org/r/698512

Mentioned in SAL (#wikimedia-operations) [2021-06-14T12:17:38Z] <XioNoX> configure OSPF link-protection on cr3-ulsfo:xe-0/1/1 - T167306

Mentioned in SAL (#wikimedia-operations) [2021-06-14T12:37:16Z] <XioNoX> configure OSPF link-protection on cr3/4-ulsfo - T167306

Deployed to cr3 and cr4-ulsfo. Some interesting (and expected) finding:

No brainer, backup from cr3-ulsfo to mr1-ulsfo is cr4-ulsfo

mr1-ulsfo
cr3-ulsfo# run show ospf backup spf 198.35.26.194 
Topology default results:

Area 0.0.0.0 results:

198.35.26.194
  Self to Destination Metric: 20000
  Parent Node: 198.35.26.192
  Primary next-hop: et-0/0/1.401 via 198.35.26.199
  Backup next-hop: ae0.2 via 198.35.26.197
  Backup Neighbor: 198.35.26.194 via: Direct
    Neighbor to Destination Metric: 0, Neighbor to Self Metric: 20000
    Self to Neighbor Metric: 20000, Backup preference: 0x0
    Not eligible, Reason: Primary next-hop link fate sharing
  Backup Neighbor: 198.35.26.193 via: Direct
    Neighbor to Destination Metric: 20000, Neighbor to Self Metric: 2
    Self to Neighbor Metric: 2, Backup preference: 0x0
    Eligible, Reason: Contributes backup next-hop
  Backup Neighbor: 208.80.154.198 via: Direct
    Neighbor to Destination Metric: 20510, Neighbor to Self Metric: 510
    Self to Neighbor Metric: 510, Backup preference: 0x0
    Not evaluated, Reason: Interface is already covered

cr3-ulsfo# run show route forwarding-table destination 198.35.26.194 
Routing table: default.inet
Internet:
Enabled protocols: Bridging, 
Destination        Type RtRef Next hop           Type Index    NhRef Netif
198.35.26.194/32   user     0                    ulst  1048584     3
                              198.35.26.199      ucst      779     3 et-0/0/1.401
                              198.35.26.197      ucst      701   305 ae0.2

cr2-eqord from cr3-ulsfo doesn't want to use cr4-ulsfo as backup, as cr4-ulsfo has cr3-ulsfo as next-hop (see path loop)

cr3-ulsfo# run show ospf backup spf 208.80.154.198 
Topology default results:

Area 0.0.0.0 results:

208.80.154.198
  Self to Destination Metric: 510
  Parent Node: 198.35.26.192
  Primary next-hop: xe-0/1/1.0 via 198.35.26.209
  Backup Neighbor: 208.80.154.198 via: Direct
    Neighbor to Destination Metric: 0, Neighbor to Self Metric: 510
    Self to Neighbor Metric: 510, Backup preference: 0x0
    Not eligible, Reason: Primary next-hop link fate sharing
  Backup Neighbor: 198.35.26.193 via: Direct
    Neighbor to Destination Metric: 512, Neighbor to Self Metric: 2
    Self to Neighbor Metric: 2, Backup preference: 0x0
    Not eligible, Reason: Path loops
  Backup Neighbor: 198.35.26.194 via: Direct
    Neighbor to Destination Metric: 20510, Neighbor to Self Metric: 20000
    Self to Neighbor Metric: 20000, Backup preference: 0x0
    Not eligible, Reason: Path loops

On the other hand, cr4-ulsfo has cr3-ulsfo as next hop for cr2-eqord, and adds cr1-codfw as backup, as no loop there.

cr4-ulsfo# run show ospf backup spf 208.80.154.198 
Topology default results:

Area 0.0.0.0 results:

208.80.154.198
  Self to Destination Metric: 512
  Parent Node: 198.35.26.192
  Primary next-hop: ae0.2 via 198.35.26.196
  Backup next-hop: xe-0/1/1.0 via 198.35.26.203
  Backup Neighbor: 208.80.153.192 via: Direct
    Neighbor to Destination Metric: 245, Neighbor to Self Metric: 390
    Self to Neighbor Metric: 390, Backup preference: 0x0
    Eligible, Reason: Contributes backup next-hop
  Backup Neighbor: 208.80.153.198 via: Direct
    Neighbor to Destination Metric: 250, Neighbor to Self Metric: 400
    Self to Neighbor Metric: 400, Backup preference: 0x0
    Not evaluated, Reason: Interface is already covered
  Backup Neighbor: 198.35.26.192 via: Direct
    Neighbor to Destination Metric: 510, Neighbor to Self Metric: 2
    Self to Neighbor Metric: 2, Backup preference: 0x0
    Not evaluated, Reason: Interface is already covered
  Backup Neighbor: 103.102.166.130 via: Direct
    Neighbor to Destination Metric: 2247, Neighbor to Self Metric: 2392
    Self to Neighbor Metric: 2392, Backup preference: 0x0
    Not evaluated, Reason: Interface is already covered
  Backup Neighbor: 198.35.26.194 via: Direct
    Neighbor to Destination Metric: 20510, Neighbor to Self Metric: 20000
    Self to Neighbor Metric: 20000, Backup preference: 0x0
    Not evaluated, Reason: Interface is already covered

So link-protection won't be useful on all the links, but as it will be useful on some of them, there is an advantage in rolling it to all the routers.
For example failover time between the two eqiad-codfw links will improved.

Mentioned in SAL (#wikimedia-operations) [2021-06-15T06:10:39Z] <XioNoX> roll OSPF link-protection to all routers - T167306

Change 698512 merged by Ayounsi:

[operations/homer/public@master] Add OSPF link-protection to all P2P links

https://gerrit.wikimedia.org/r/698512

Closed! After 4 years and 1 week.