I am seeing a log of BGP flapping events for AS 61955 in https://librenms.wikimedia.org/eventlog, found it by chance while checking another issue. Not sure if it is already handled by other means (icinga didn't show anything), in case sorry for the spam :)
Description
Details
| Subject | Repo | Branch | Lines +/- | |
|---|---|---|---|---|
| Add BGP prefix damping to IX policies | operations/homer/public | master | +12 -0 |
Related Objects
Event Timeline
We don't monitor our IX peers much.
We probably should configure BGP route damping, https://www.juniper.net/documentation/en_US/junos/topics/usage-guidelines/policy-using-routing-policies-to-damp-bgp-route-flapping.html
To reduce router churn as well as improve user experience.
Keeping the default damping settings (per the doc) here is what I think we should push to our routers:
[edit protocols bgp group IX4] + damping; [edit protocols bgp group IX6] + damping; [edit policy-options policy-statement BGP_IXP_in] term rpki-invalids { ... } + /* T222424 */ + term damping { + then damping default; + } [edit policy-options] + /* T222424 */ + damping default { + half-life 15; + reuse 750; + suppress 3000; + max-suppress 60; + }
To Private peers as well.
Note that this would only apply to the prefixes we learn from our peers, and not the ones we advertise them. Which mean a flapping session could still cause issues, unless the other side has damping enabled.
This could be applied to transits as well, but unless we have some last resort default routes, there is a (low) risk of black-holing traffic.
RIPE routing-WG recommends a suppress-value of 6000 if we go to that we may want to also increase the reuse but i couldn't find any adfvice on that.
Great doc, thanks!
We can use 2000 for reuse, the following will happen:
Flaps up to 6000, then gets stable:
15 min -> 3000
30 min -> 1500 (unblocked as < 2000 )
Accepting a prefix 30min after an event ends seems reasonable to me.
Updated change with the above feedbacks:
[edit protocols bgp group IX4] + damping; [edit protocols bgp group IX6] + damping; [edit policy-options policy-statement BGP_IXP_in] term rpki-invalids { ... } + /* T222424 */ + term damping { + then damping default; + } [edit policy-options] + /* T222424 */ + damping default { + half-life 15; + reuse 2000; + suppress 6000; + max-suppress 60; + }
Will push it shortly to ulsfo/eqdfw/eqord. Then if all good, tomorrow to eqsin/eqiad/esams.
Mentioned in SAL (#wikimedia-operations) [2019-10-02T18:15:12Z] <XioNoX> add BGP route damping on IX sessions - ulsfo - T222424
For the record:
cr4-ulsfo> show bgp neighbor | match "Suppressed due to damping"| except " 0"
Suppressed due to damping: 1
Suppressed due to damping: 1
Suppressed due to damping: 27
Suppressed due to damping: 1
Suppressed due to damping: 1
Suppressed due to damping: 2
Suppressed due to damping: 2
Suppressed due to damping: 3
Suppressed due to damping: 1
Suppressed due to damping: 1This is out of ~120 BGP sessions, the 27 is out of ~50000 prefixes advertised by this peer.
Mentioned in SAL (#wikimedia-operations) [2019-10-02T18:25:04Z] <XioNoX> add BGP route damping on IX sessions - eqdfw - T222424
Mentioned in SAL (#wikimedia-operations) [2019-10-02T18:28:06Z] <XioNoX> add BGP route damping on IX sessions - eqord - T222424
Eqord:
Suppressed due to damping: 4 Suppressed due to damping: 4 Suppressed due to damping: 1 Suppressed due to damping: 1
eqdfw:
Suppressed due to damping: 1 Suppressed due to damping: 1 Suppressed due to damping: 1
All acceptable values. But also mean that enabling damping on transit links might cut off some prefixes from us.
Mentioned in SAL (#wikimedia-operations) [2019-10-07T17:22:10Z] <XioNoX> add BGP route damping on IX sessions - eqsin - T222424
Mentioned in SAL (#wikimedia-operations) [2019-10-07T17:27:08Z] <XioNoX> add BGP route damping on IX sessions - esams - T222424
Mentioned in SAL (#wikimedia-operations) [2019-10-07T17:28:19Z] <XioNoX> add BGP route damping on IX sessions - eqiad - T222424
Change 541367 had a related patch set uploaded (by Ayounsi; owner: Ayounsi):
[operations/homer/public@master] Add BGP prefix damping to IX policies
Change 541367 merged by Ayounsi:
[operations/homer/public@master] Add BGP prefix damping to IX policies