Page MenuHomePhabricator

configure BGP route damping on IX sessions
Closed, ResolvedPublic0 Story Points

Description

I am seeing a log of BGP flapping events for AS 61955 in https://librenms.wikimedia.org/eventlog, found it by chance while checking another issue. Not sure if it is already handled by other means (icinga didn't show anything), in case sorry for the spam :)

Details

Related Gerrit Patches:
operations/homer/public : masterAdd BGP prefix damping to IX policies

Event Timeline

elukey created this task.May 3 2019, 7:17 AM
Restricted Application added a project: Operations. · View Herald TranscriptMay 3 2019, 7:18 AM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript

We don't monitor our IX peers much.
We probably should configure BGP route damping, https://www.juniper.net/documentation/en_US/junos/topics/usage-guidelines/policy-using-routing-policies-to-damp-bgp-route-flapping.html
To reduce router churn as well as improve user experience.

Dzahn triaged this task as Normal priority.May 3 2019, 8:40 PM
Dzahn added a project: observability.
ayounsi renamed this task from cr2-esams: BGP flapping for AS 61955 (ipv4 and ipv6) to configure BGP route damping on IX sessions.Sep 24 2019, 7:29 PM
ayounsi claimed this task.Sep 24 2019, 7:36 PM
ayounsi added subscribers: mark, faidon.

Keeping the default damping settings (per the doc) here is what I think we should push to our routers:

[edit protocols bgp group IX4]
+    damping;
[edit protocols bgp group IX6]
+    damping;
[edit policy-options policy-statement BGP_IXP_in]
     term rpki-invalids { ... }
+    /* T222424 */
+    term damping {
+        then damping default;
+    }
[edit policy-options]
+   /* T222424 */
+   damping default {
+       half-life 15;
+       reuse 750;
+       suppress 3000;
+       max-suppress 60;
+   }

To Private peers as well.

Note that this would only apply to the prefixes we learn from our peers, and not the ones we advertise them. Which mean a flapping session could still cause issues, unless the other side has damping enabled.

This could be applied to transits as well, but unless we have some last resort default routes, there is a (low) risk of black-holing traffic.

@faidon / @mark thoughts?

RIPE routing-WG recommends a suppress-value of 6000 if we go to that we may want to also increase the reuse but i couldn't find any adfvice on that.

Great doc, thanks!
We can use 2000 for reuse, the following will happen:

Flaps up to 6000, then gets stable:
15 min -> 3000
30 min -> 1500 (unblocked as < 2000 )

Accepting a prefix 30min after an event ends seems reasonable to me.

jbond added a comment.Oct 1 2019, 9:11 AM

Yep seems reasonable to me

Updated change with the above feedbacks:

[edit protocols bgp group IX4]
+    damping;
[edit protocols bgp group IX6]
+    damping;
[edit policy-options policy-statement BGP_IXP_in]
     term rpki-invalids { ... }
+    /* T222424 */
+    term damping {
+        then damping default;
+    }
[edit policy-options]
+   /* T222424 */
+   damping default {
+       half-life 15;
+       reuse 2000;
+       suppress 6000;
+       max-suppress 60;
+   }

Will push it shortly to ulsfo/eqdfw/eqord. Then if all good, tomorrow to eqsin/eqiad/esams.

Mentioned in SAL (#wikimedia-operations) [2019-10-02T18:15:12Z] <XioNoX> add BGP route damping on IX sessions - ulsfo - T222424

For the record:

cr4-ulsfo> show bgp neighbor | match "Suppressed due to damping"| except "    0"                      
    Suppressed due to damping:    1
    Suppressed due to damping:    1
    Suppressed due to damping:    27
    Suppressed due to damping:    1
    Suppressed due to damping:    1
    Suppressed due to damping:    2
    Suppressed due to damping:    2
    Suppressed due to damping:    3
    Suppressed due to damping:    1
    Suppressed due to damping:    1

This is out of ~120 BGP sessions, the 27 is out of ~50000 prefixes advertised by this peer.

Mentioned in SAL (#wikimedia-operations) [2019-10-02T18:25:04Z] <XioNoX> add BGP route damping on IX sessions - eqdfw - T222424

Mentioned in SAL (#wikimedia-operations) [2019-10-02T18:28:06Z] <XioNoX> add BGP route damping on IX sessions - eqord - T222424

Eqord:

Suppressed due to damping:    4
Suppressed due to damping:    4
Suppressed due to damping:    1
Suppressed due to damping:    1

eqdfw:

Suppressed due to damping:    1
Suppressed due to damping:    1
Suppressed due to damping:    1

All acceptable values. But also mean that enabling damping on transit links might cut off some prefixes from us.

Mentioned in SAL (#wikimedia-operations) [2019-10-07T17:22:10Z] <XioNoX> add BGP route damping on IX sessions - eqsin - T222424

Mentioned in SAL (#wikimedia-operations) [2019-10-07T17:27:08Z] <XioNoX> add BGP route damping on IX sessions - esams - T222424

Mentioned in SAL (#wikimedia-operations) [2019-10-07T17:28:19Z] <XioNoX> add BGP route damping on IX sessions - eqiad - T222424

ayounsi closed this task as Resolved.Oct 7 2019, 5:29 PM

All done!

Change 541367 had a related patch set uploaded (by Ayounsi; owner: Ayounsi):
[operations/homer/public@master] Add BGP prefix damping to IX policies

https://gerrit.wikimedia.org/r/541367

Change 541367 merged by Ayounsi:
[operations/homer/public@master] Add BGP prefix damping to IX policies

https://gerrit.wikimedia.org/r/541367