Page MenuHomePhabricator

Configure interface damping on primary links
Closed, ResolvedPublic

Description

This is to prevent connectivity issues caused by inter-DC links flapping.
In the past this has caused at least spikes of 503s.

The goal is to apply interface damping only on the primary link between 2 sites, where we at least have 2 backup options. That way we minimize the risk of multiple links being down at the same time.

Zayo link between codfw-eqiad
Zayo link between codfw-ulsfo
Level3 link between eqiad-esams

damping {
	enable;
	max-suppress 600; # Even if still flapping, re-enable interface after 10min (and start counters over)
	suppress 2000;    # Counter increase by 1000 at each flap, keep interface down when reached 2000
	half-life 15;     # Reduce by half the penalty counter after 15s
	reuse 100;        # Interface up when counter falls bellow 100
}

Example counters (and states), after 2 flaps in less than 15s, stable after:
5s =2000 (down), 15s = 1000, 30s = 500, 45s = 250, 1min = 125, 1m15 = 75 (back up)

if re-flap after 30s:
5s =2000 (down), 15s = 1000, 30s = 500, 35s = 1500, 45s = 750, 1min = 375, 1m15 = 187,5, 1m30 = 93 (back up)
etc.

In term of monitoring, the interfaces will report as down, which should be alerted on by our existing Icinga checks.

More doc:
https://www.juniper.net/documentation/en_US/junos/topics/concept/physical-interface-damping.html
https://www.juniper.net/documentation/en_US/junos/topics/reference/configuration-statement/damping-edit-interfaces.html

Event Timeline

ayounsi triaged this task as Medium priority.Jun 5 2018, 8:18 AM
ayounsi created this task.
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Vvjjkkii renamed this task from Configure interface damping on primary links to vmbaaaaaaa.Jul 1 2018, 1:06 AM
Vvjjkkii removed ayounsi as the assignee of this task.
Vvjjkkii raised the priority of this task from Medium to High.
Vvjjkkii updated the task description. (Show Details)
Vvjjkkii removed a subscriber: Aklapper.
CommunityTechBot renamed this task from vmbaaaaaaa to Configure interface damping on primary links.Jul 2 2018, 7:05 AM
CommunityTechBot assigned this task to ayounsi.
CommunityTechBot lowered the priority of this task from High to Medium.
CommunityTechBot updated the task description. (Show Details)
CommunityTechBot added a subscriber: Aklapper.

Looking at doing this Wednesday August 1st, 3 PM UTC, 1h expected.

1 link at a time, only on the primary of the redundant ones, and outside link maintenance.

Edit: fine prints say available starting Junos 14.2. So when the new cr3-esams and cr-3/4-ulsfo are ready. Plus upgrade codfw/eqiad routers.

Mentioned in SAL (#wikimedia-operations) [2019-09-17T21:01:21Z] <XioNoX> enable interface damping on primary eqiad-esams link (eqiad side) - T196432

Mentioned in SAL (#wikimedia-operations) [2019-09-18T21:09:20Z] <XioNoX> enable damping on codfw-ulsfo link - T196432

Mentioned in SAL (#wikimedia-operations) [2019-09-18T21:13:40Z] <XioNoX> enable damping on primary codfw-eqiad link - T196432

All primary link of all transport pairs have now damping configured.