Page MenuHomePhabricator

investigate ethernet errors: asw2-a5-eqiad port xe-0/0/36
Closed, ResolvedPublic

Description

asw2-a5-eqiad xe-0/0/36 has some input errors. It's part of ae0, which is the 4-port aggregate back to asw-a-eqiad, and is the only one of the 4 showing the errors. None of the 4 links (or the ae) is showing errors on the other side at asw-a-eqiad. The errors have been there since at least Sep 2013 in librenms's history.

This is possibly related to the pattern of inter-row elevated nutcracker timeout rates shown in: https://phabricator.wikimedia.org/T102199#1499358

Event Timeline

BBlack raised the priority of this task from to High.
BBlack updated the task description. (Show Details)
BBlack added projects: acl*sre-team, netops.
BBlack added subscribers: BBlack, faidon, mark and 2 others.

I've disabled the link (1/4 from aggregate) on both sides:

{master:0}[edit]
bblack@asw2-a5-eqiad# show|compare
[edit interfaces xe-0/0/36]
+   disable;

{master:0}[edit]
bblack@asw2-a5-eqiad# commit
{master:8}[edit]
bblack@asw-a-eqiad# show|compare
[edit interfaces xe-6/1/0]
+   disable;

{master:8}[edit]
bblack@asw-a-eqiad# commit

librenms graphs show ae0 errors have dropped back to zero.
Leaving this ticket open still to resolve the actual link issue and turn it back on.

I replaced the fiber. Let's turn it up and see if it's any better. Next step would be to replace optics

I turned it up, but it seems there is no link on it now:

xe-0/0/36       up    down Core: << asw-a-eqiad:xe-6/1/0 {#2169}
Cmjohnson claimed this task.

new fiber # is 3908....@faidon verified all looks good in IRC