Page MenuHomePhabricator

Link from lsw1-e1-eqiad to lsw1-f3-eqiad down
Closed, ResolvedPublic

Description

@Cmjohnson @Jclark-ctr hoping one of you might be able to help here.

This link is currently showing as down:

https://netbox.wikimedia.org/dcim/interfaces/24128/trace/

It's a bit odd as both sides show good light levels across all 4 lanes. But interface is staying down.

I think we should have a spare QSFP28-IR4-100G / 100GBASE-CWDM4 QSFP28 module in Eqiad (green handle). lsw1-f3-eqiad is the one reporting local fault, so might be an idea to swap out the SFP that side and see does it help. And/or just generally check the fiber / make sure it's inserted correctly etc.

Drop me a line on irc when you are able to look at it and I will work on it from the switch side, hopefully it's just something simple.

Thanks.

Event Timeline

cmooney created this task.

@cmooney The QSFP28 module for et-o/o/54 on lsw1-f3-eqiad has been replaced.

@Cmjohnson @Jclark-ctr can we get someone to visit the DC urgently to look at this?

I'm concerned we have a link down between core devices for over a week now. I appreciate the optic was replaced but it has not changed the situation, so we need to work on it together to find the problem and get the link back up. As things stand live traffic is at risk depending on only a single link.

Touch base with me on IRC when on site and we can run through some tests. Thanks.

Thanks @Jclark-ctr that seems to have done it:

cmooney@lsw1-e1-eqiad> show interfaces et-0/0/54    
Aug 22 20:37:45
Physical interface: et-0/0/54, Enabled, Physical link is Up
  Interface index: 677, SNMP ifIndex: 525
  Description: Core: lsw1-f3-eqiad:et-0/0/54 {#G2108191173001100}
cmooney@lsw1-e1-eqiad> show ospf interface et-0/0/54.0 detail                                             
Aug 22 20:42:00
Interface           State   Area            DR ID           BDR ID          Nbrs
et-0/0/54.0         PtToPt  0.0.0.0         0.0.0.0         0.0.0.0            1
  Type: P2P, Address: 10.64.129.10, Mask: 255.255.255.254, MTU: 9202, Cost: 8
  Adj count: 1
  Hello: 10, Dead: 40, ReXmit: 5, Not Stub
cmooney@lsw1-e1-eqiad> ping 10.64.129.11 source 10.64.129.10 rapid do-not-fragment count 10000   
Aug 22 20:39:29
PING 10.64.129.11 (10.64.129.11): 56 data bytes
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
<--- cut --->
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
--- 10.64.129.11 ping statistics ---
10000 packets transmitted, 10000 packets received, 0% packet loss
round-trip min/avg/max/stddev = 0.331/0.868/70.942/2.195 ms
cmooney@lsw1-f3-eqiad> show route table PRODUCTION.inet.0 0.0.0.0/0 exact 

PRODUCTION.inet.0: 51 destinations, 53 routes (51 active, 0 holddown, 0 hidden)
@ = Routing Use Only, # = Forwarding Use Only
+ = Active Route, - = Last Active, * = Both

0.0.0.0/0          @[EVPN/170] 00:06:55
                    >  to 10.64.129.10 via et-0/0/54.0
                    [EVPN/170] 1w3d 18:33:39
                    >  to 10.64.129.24 via et-0/0/55.0
                   #[Multipath/255] 00:06:55, metric2 8
                    >  to 10.64.129.10 via et-0/0/54.0
                       to 10.64.129.24 via et-0/0/55.0
cmooney renamed this task from Link from lsw1-e1-eqiad to lsw1-f2-eqiad down to Link from lsw1-e1-eqiad to lsw1-f3-eqiad down.Aug 22 2022, 8:44 PM
cmooney closed this task as Resolved.