Page MenuHomePhabricator

Improve resiliency of the eqsin transport link
Closed, ResolvedPublic


Follow up from incident

One of the issue was caused by the main transport link terminating on cr1-eqsin flapping and causing 500 errors as the caches couldn't reach the main DCs.

A few options:

  1. Terminating it on cr2-eqsin (and the tunnel on cr1 for redundancy)
  2. Adding a 2nd link
  3. Configuring link damping (cf. T196432)

As the tunnel has proven quite reliable I'd suggest do to 3 first, then ideally 2 at some point.

Event Timeline

ayounsi triaged this task as Medium priority.Oct 30 2019, 8:21 AM
ayounsi created this task.
Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Mentioned in SAL (#wikimedia-operations) [2019-11-07T00:21:19Z] <XioNoX> enable interface damping on primary eqsin-codfw link - T236878

ayounsi claimed this task.

Damping configured.