Page MenuHomePhabricator

Interface errors on cr1-eqiad:xe-3/3/1
Closed, ResolvedPublic

Description

See https://librenms.wikimedia.org/graphs/id=11623/type=port_errors/
https://wikitech.wikimedia.org/wiki/Network_monitoring#Inbound/outbound_interface_errors

Optic most likely needs to be replaced, please sync up with me so I can drain the traffic beforehand.

Event Timeline

ayounsi triaged this task as Medium priority.Jan 2 2019, 4:38 PM
ayounsi created this task.

Mentioned in SAL (#wikimedia-operations) [2019-01-07T18:05:17Z] <XioNoX> deactivate bgp sessions to Zayo on T212791

Mentioned in SAL (#wikimedia-operations) [2019-01-07T18:18:40Z] <XioNoX> activate bgp sessions to Zayo on cr1-eqiad - T212791

Replaced the optics @ayounsi please resolve once confirmed all is well.

Mentioned in SAL (#wikimedia-operations) [2019-01-07T18:51:51Z] <XioNoX> re-deactivate bgp sessions to Zayo on cr1-eqiad - T212791

After the initial optics swap, the link was still not working.

I proceeded to swap the optics again (no change)
I replaced the patch cable again (no change)
replaced the optics one more time for good measure (no change)

@ayounsi does get a link locally but cannot ping the other side.

Reply from Zayo:

The latest update is from 1/8/2018 10:53 AM CST :
Good morning, yes we do have good connectivity as our interface is with good light statistics. The BGP session is stuck in "connect" status. I bounced the BGP session and cleared damping on the session as well and found no improvements. Our logs do indicate that the session went down at 18:17 UTC due to low light received from you. Our documentation indicates that we connect to you via an Equinix cross connect at 21715 Filigree court in Ashburn VA. Please enage your cross connect provider to check/clean their connections and report their findings to us.

Email sent to Equinix to verify the X-connect.

Mentioned in SAL (#wikimedia-operations) [2019-01-14T19:28:08Z] <XioNoX> re-activate BGP to Zayo on cr1-eqiad - T212791

Mentioned in SAL (#wikimedia-operations) [2019-01-14T19:32:48Z] <XioNoX> re-deactivate BGP to Zayo on cr1-eqiad - T212791

Equinix cleaned and tested the X-connect, but the issue persists.
Next step is to do another round of testing/swapping on our side and follow up with Zayo if no resolution.

Mentioned in SAL (#wikimedia-operations) [2019-01-15T16:57:19Z] <XioNoX> move cr1-eqiad:xe-3/3/1 to xe-4/1/3 - T212791

Mentioned in SAL (#wikimedia-operations) [2019-01-15T17:06:41Z] <XioNoX> move back cr1-eqiad:xe-4/1/3 to xe-3/3/1 - T212791

Told Zayo about Equinix's test/cleanup of the X-connect. Then received a "Dispatch Charge Notification" without approval request, and the folloing Zayo updates:

"Good afternoon, we received your update and appreciate the information. We will continue to investigate this issue and provide you with our findings."

"Good afternoon, we are expecting our field tech to be on site any moment now. We will keep you updated on our progress.".

Mentioned in SAL (#wikimedia-operations) [2019-01-15T21:49:30Z] <XioNoX> re-activate BGP to Zayo on cr1-eqiad - T212791

Zayo tech swapped their optic and so far no more errors.

I re-enabled BGP.