Page MenuHomePhabricator

Interface errors on cr2-eqiad:xe-4/3/1
Closed, ResolvedPublic

Description

cr2-eqiad:xe-4/3/1 has been showing "L3 incompletes" errors (see graphs )
This interface is one of our transit links to Telia.

According to Juniper's documentation, L3 incompletes can be caused either by corrupted packets or non standard packets like Cisco's CTP.
According to our peer's mac address the remote side is indeed a Cisco device.
Even though Cisco's CTP is the main guess, the fact that they are not regular is surprising.

I suggest we do in this order:

  1. Ask Telia if they see errors on their side
  2. Ask Telia to disable CTP ("no keepalive" on the Cisco interface)
  3. If errors continues, investigate the link and the optics
  4. Last resort if no issues on the link, is to add the statement "ignore-l3-incompletes" on the interface

Event Timeline

ayounsi created this task.Apr 21 2017, 10:11 AM
Restricted Application added a project: Operations. · View Herald TranscriptApr 21 2017, 10:11 AM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
ayounsi added a comment.EditedApr 21 2017, 1:12 PM

Telia ticket #00725687 opened.

  1. No errors on telia's side, see:
show interfaces  TenGigE0/12/0/25 detail
Mon Apr 24 11:01:19.740 CET
TenGigE0/12/0/25 is up, line protocol is up
  Interface state transitions: 1
  Hardware is TenGigE, address is d46d.5010.d3c9 (bia d46d.5010.d3c9)
  Layer 1 Transport Mode is LAN
  Description: To Wikimedia Foundation, Inc.;IP_TRANSIT;IC-308845;;
  Internet address is 80.239.132.225/30
  MTU 1514 bytes, BW 10000000 Kbit (Max: 10000000 Kbit)
     reliability 255/255, txload 1/255, rxload 7/255
  Encapsulation ARPA,
  Full-duplex, 10000Mb/s, link type is force-up
  output flow control is off, input flow control is off
  Carrier delay (up) is 10 msec
  loopback not set,
  Last link flapped 33w6d
  ARP type ARPA, ARP timeout 04:00:00
  Last input 00:00:00, output 00:00:00
  Last clearing of "show interface" counters 28w0d
  30 second input rate 283822000 bits/sec, 26935 packets/sec
  30 second output rate 74174000 bits/sec, 11052 packets/sec
     3235180169801 packets input, 3966627741706909 bytes, 3464402 total input drops
     0 drops for unrecognized upper-level protocol
     Received 14123 broadcast packets, 0 multicast packets
              0 runts, 0 giants, 0 throttles, 0 parity
     0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored, 0 abort
     1598997118527 packets output, 246695295700551 bytes, 2431706039 total output drops
     Output 374 broadcast packets, 94117 multicast packets
     0 output errors, 0 underruns, 0 applique, 0 resets
     0 output buffer failures, 0 output buffers swapped out
     0 carrier transitions
  1. No CTP configured on their side as well

We have no cdp enabled in our router interface and hence “keepalive” parameter is no at all configured at the interface level. There is no such configuration made at interface level. We are having decent light levels as well.

  1. As it's not CRC errors, it's unlikely that the issue is on the circuit or the optics
  2. Still some troubleshooting possible: http://blog.ip.fi/2014/02/junos-l3-incompletes-what-and-why.html
ayounsi triaged this task as Lowest priority.Apr 27 2017, 7:06 AM
ayounsi moved this task from Backlog to Troubleshooting on the netops board.Jun 27 2017, 2:42 PM

Mentioned in SAL (#wikimedia-operations) [2017-07-12T19:37:03Z] <XioNoX> adding ignore-l3-incompletes to all peering/transit interfaces - T163542

ayounsi closed this task as Resolved.Jul 12 2017, 7:49 PM

Did some more troubleshooting on that interface and some others showing regular l3 incomplete.
I managed to capture packets coming from various providers, toward various destinations (like the ripe atlas anchors).
As we don't control the source networks (and never got replies to my emails) those alerts are non-actionable and generating noise on our monitoring.
I added the statement "ignore-l3-incompletes" to all our peering and transit interfaces, if the errors show up on our core links, then we could more efficiently investigate it, but nothing we can do for our externally facing interfaces.