Page MenuHomePhabricator

hw troubleshooting: Faulty 40GBase-LR4 link from cloudsw1-d5-eqiad to cloudsw1-f4-eqiad
Closed, ResolvedPublicRequest

Description

Starting in the past 24 hours, and particularly over the last 12 hours, we have seen incrementing errors on core WMCS link between racks D5 and F4, presenting as inbound errors on cloudsw1-d5-eqiad et-0/0/53:

image.png (728×1 px, 270 KB)

That port is connected to cloudsw1-f4-eqiad port et-0/0/54 over single-mode fiber. As usual with these things the likely culprit is a bad optic module at one end or other, but difficult to say which. Modules are 40G-BaseLR4 QSFP+ (blue handle I think).

DC-Ops when possible can we try replacing the module in cloudsw1-d5-eqiad et-0/0/53 and we can see if there is any improvement? The link has been drained so we can do this any time. If that does not help we can instead try swapping the far side in F4. Please ping me on irc when available and we can run some tests see how it looks. Thanks.

Event Timeline

cmooney renamed this task from hw troubleshooting: bad fiber cable between cloudsw1-d5-eqiad port et-0/0/53 to cloudsw1-f4-eqiad port et-0/0/54 to hw troubleshooting: Faulty 40GBase-LR4 link from cloudsw1-d5-eqiad to cloudsw1-f4-eqiad.Tue, Jun 11, 4:37 PM
cmooney updated the task description. (Show Details)
cmooney added a subscriber: VRiley-WMF.

Swapped 40Base-LR4 in D5 port et-0/0/53.

cmooney lowered the priority of this task from Medium to Low.Tue, Jun 11, 6:28 PM
cmooney subscribed.

Thanks for the help with this @VRiley-WMF. The link has now been clean for over an hour so I think we can say for certain the optic from cloudsw1-f4-eqiad et-0/0/54 was faulty. We should dispose of that one now.

I guess we can close this now also, the one question I have is if we've more spares of that type, or if we should order a replacement?

You're welcome @cmooney We do have spares if they are needed in the future. Closing this ticket.