Page MenuHomePhabricator

Interface errors on asw-c-codfw:xe-7/0/46
Closed, ResolvedPublic

Description

asw-c-codfw:xe-7/0/46 has been showing input framing errors (see: https://librenms.wikimedia.org/graphs/id=9967/type=port_errors )
Should be on member 7, so rack C7, switch serial# TA3713500191
lvs2002:eth2 doesn't show issues, my guess is an optic on either side that needs to be re-seated or replaced.

Fixing it will cause that link to go down, so we probably want to de-pool it first.

If traffic increases, errors will most likely increase as well.

Event Timeline

ayounsi created this task.Apr 19 2017, 1:09 PM
Restricted Application added a project: Operations. · View Herald TranscriptApr 19 2017, 1:09 PM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript

To do a soft-ish failover, on lvs2002 we can disable the puppet agent and stop pybal temporarily, wait a few minutes for traffic to settle over to lvs2005, and then re-seat or replace the optic on lvs2002 (and then restart pybal + re-enable puppet to bring lvs2002 back into service).

Change 349171 had a related patch set uploaded (by Alexandros Kosiaris):
[operations/puppet@production] Switch einsteinium and tegmen roles

https://gerrit.wikimedia.org/r/349171

Wrong patch above, please ignore.

ayounsi claimed this task.Apr 20 2017, 2:35 PM
ayounsi updated the task description. (Show Details)Apr 20 2017, 2:59 PM

Mentioned in SAL (#wikimedia-operations) [2017-04-20T15:16:19Z] <XioNoX> disabling pybal on lvs2002 for T163323

ayounsi closed this task as Resolved.Apr 20 2017, 4:11 PM

papaul replaced the SFP on the switch side. Stress-testing done with bblack, no more interfaces errors.