Page MenuHomePhabricator

Configure interface damping on primary links
Closed, ResolvedPublic


This is to prevent connectivity issues caused by inter-DC links flapping.
In the past this has caused at least spikes of 503s.

The goal is to apply interface damping only on the primary link between 2 sites, where we at least have 2 backup options. That way we minimize the risk of multiple links being down at the same time.

Zayo link between codfw-eqiad
Zayo link between codfw-ulsfo
Level3 link between eqiad-esams

damping {
	max-suppress 600; # Even if still flapping, re-enable interface after 10min (and start counters over)
	suppress 2000;    # Counter increase by 1000 at each flap, keep interface down when reached 2000
	half-life 15;     # Reduce by half the penalty counter after 15s
	reuse 100;        # Interface up when counter falls bellow 100

Example counters (and states), after 2 flaps in less than 15s, stable after:
5s =2000 (down), 15s = 1000, 30s = 500, 45s = 250, 1min = 125, 1m15 = 75 (back up)

if re-flap after 30s:
5s =2000 (down), 15s = 1000, 30s = 500, 35s = 1500, 45s = 750, 1min = 375, 1m15 = 187,5, 1m30 = 93 (back up)

In term of monitoring, the interfaces will report as down, which should be alerted on by our existing Icinga checks.

More doc:

Event Timeline

ayounsi triaged this task as Medium priority.Jun 5 2018, 8:18 AM
ayounsi created this task.
Restricted Application added a project: Operations. · View Herald TranscriptJun 5 2018, 8:18 AM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
ayounsi updated the task description. (Show Details)Jun 5 2018, 8:20 AM
BBlack moved this task from Triage to Network on the Traffic board.Jun 11 2018, 5:35 PM
Vvjjkkii renamed this task from Configure interface damping on primary links to vmbaaaaaaa.Jul 1 2018, 1:06 AM
Vvjjkkii removed ayounsi as the assignee of this task.
Vvjjkkii raised the priority of this task from Medium to High.
Vvjjkkii updated the task description. (Show Details)
Vvjjkkii removed a subscriber: Aklapper.
CommunityTechBot renamed this task from vmbaaaaaaa to Configure interface damping on primary links.Jul 2 2018, 7:05 AM
CommunityTechBot assigned this task to ayounsi.
CommunityTechBot lowered the priority of this task from High to Medium.
CommunityTechBot updated the task description. (Show Details)
CommunityTechBot added a subscriber: Aklapper.
ayounsi added a comment.EditedJul 25 2018, 8:08 PM

Looking at doing this Wednesday August 1st, 3 PM UTC, 1h expected.

1 link at a time, only on the primary of the redundant ones, and outside link maintenance.

Edit: fine prints say available starting Junos 14.2. So when the new cr3-esams and cr-3/4-ulsfo are ready. Plus upgrade codfw/eqiad routers.

Mentioned in SAL (#wikimedia-operations) [2019-09-17T21:01:21Z] <XioNoX> enable interface damping on primary eqiad-esams link (eqiad side) - T196432

Mentioned in SAL (#wikimedia-operations) [2019-09-18T21:09:20Z] <XioNoX> enable damping on codfw-ulsfo link - T196432

Mentioned in SAL (#wikimedia-operations) [2019-09-18T21:13:40Z] <XioNoX> enable damping on primary codfw-eqiad link - T196432

ayounsi closed this task as Resolved.Sep 18 2019, 9:16 PM

All primary link of all transport pairs have now damping configured.