Page MenuHomePhabricator

cr3-eqsin:xe-0/1/1 interface errors
Closed, ResolvedPublic

Description

See https://librenms.wikimedia.org/graphs/id=13969/type=port_errors/

Usual cable/optic swap, a complication is that the other side is a transit provider and the issue could be on their side. I still recommend checking our side first.
BGP to that peer needs to be disabled ahead of the onsite work.

Ideally to be added to T294968 but we should fix it sooner than latter (eg. max in the next month).

Circuit ID IC-331928

  • 2022-06-15 - optic swapped and issue remains, need to escalate to Telia to check the link on their end.
  • 2022-06-16 - Arelion support case 01418061​ to investigate

Event Timeline

ayounsi triaged this task as Medium priority.Jan 31 2022, 10:43 AM
ayounsi created this task.

Sorry about that, the SRX just shipped today! So I'll have Jin work on this when he goes out to install the SRX for mr1-eqsin replacement sometime next week?

Looks like we forgot about that during Jin' visits. What's the next step?

Jin,

When you were last onsite, I neglected to include the swap of a problematic optic we have.

Can you quote us for an on-site to swap the optic in cr3-eqsin:xe-0/1/1 located in 603, U40. It should have an SFP+ LR optic installed, and we need it swapped with one of the spare LR optics onsite in our racks. We should still have a couple spare there, but if not I'll order more. Do you recall if we had any? (My records show we shoudl have atleast 2 spares left, but my records could be wrong.) This optic has patch #1120 attached.

We'll have you swap the optic and put the possibly defective optic in a bag with the task # T300485 on it so if the errors disappear on the interface, we'll have you throw it away at a future onsite visit.

So an optic swap that we'll schedule to take place overlapping with our netops staff hours. They are located in the EU, so it'll be something like your afternoon, their morning.

Let me know if there are any questions.

Thanks in advance,

Entered https://wikimedia.coupahost.com/easy_form_responses/3234 into coupa for this work, Jin will coordinate with Arzhel via email and hangout for the actual work window.

Mentioned in SAL (#wikimedia-operations) [2022-06-15T06:52:58Z] <XioNoX> disable BGP to Telia in eqsin for optic replacement - T300485

Mentioned in SAL (#wikimedia-operations) [2022-06-15T08:03:27Z] <XioNoX> re-enable BGP to Telia in eqsin for optic replacement - T300485

We tried:

  • New optic
  • New patch cable
  • New router port

And the errors are still present. Next step is to follow up with Telia so they check their side and then the DC for the X-connect.

Arzhel,

Would you be the person to open the ticket with Telia? (Just checking to ensure we both don't assume the other is handling it.) Just checking!

Ideally I'd like DCops to take care of link/interface level problems. I'm happy to help if needed though.

Worked on the email draft with Arzhel and just emailed it in CC'd both Arzhel and Cathal. Once I have more info I'll update this ticket.

Arelion support case 01418061​ to investigate things. I'll followup with them as they progress the case.

@ayounsi,

So as you can see they advised they want us to go and investigate the cross-connect, and if they result in charges we'll use that thread to get a credit on our Arelion account.

I can file the ticket for a loopback to be placed on the Z side termination for testing, but since this is an active transport wanted to schedule this downtime with you in advance of filing the ticket, so I can list the window on the ticket.

Please advise, and keep in mind we ideally schedule this 2 business days out minimum due to timezone differences. If I include the window on the ticket, we can advise them with a loopback start and end time.

Noted, that's the link we do the least traffic on so we can keep it down for some time. I'll take care of disabling BGP on Monday.

Ticket 1-218053856766 opened for the loopback test.

Support,

We need to test our cross-connection 20676697-A, which terminates into our panel @ PP:0603:1087235 - 15/16 and from that into our router cr3-eqsin (WMF7241)
xe-0/1/1.

We need the loopback placed on the Z side of this cross-connection, just before it terminates into Telia's patch panel. We're experiencing line errors, and are attempting to isoloate where the fault is occurring. This will allow us to test the cross-connection from our router.

We have already swapped the optics and patch cable in our rack, so this is our next step.

Please place the loopback starting on 2022-06-23 @ 09:00 Singapore time. Our engineers work in EU timezones, so they'll be able to perform any loopback testing required later on 2022-06-23.

Please plan to remove the loopback and return the line to normal service on 2022-06-24 @ 09:00 AM Singapore time.

Our Network Engineers will drain this link of use on 2022-06-22, so you can begin work on 2022-06-23 without confirming with us before start. To repeat, no need to confirm the start of this window, as we've already scheduled it with our internal teams.

Thanks!

Ok, they can place the loop back 1 hop away on sg1 side of things and asked if they could do so today while on the call. I advised not yet, as we haven't drained that of traffic.

@ayounsi or @cmooney: Can you drain this link of use immediately so they can start the work tomorrow rather than Thursday?

Change 807492 had a related patch set uploaded (by Ayounsi; author: Ayounsi):

[operations/homer/public@master] eqsin: disable Telia transit

https://gerrit.wikimedia.org/r/807492

Change 807492 merged by jenkins-bot:

[operations/homer/public@master] eqsin: disable Telia transit

https://gerrit.wikimedia.org/r/807492

Ok now SG3 staff are telling me my ticket isn't valid for this type of thing, despite telling me on a voice call yesterday they'd place it today, and require me to raise a trouble ticket, not a remote hands loopback test ticket.

Rather than argue with them, because they will win, I've opened 1-217601218912. They should install the loopback immediately and leave it in place until directed otherwise. The loopback should be placed back to our A size, but 1 hop from the Z side termination.

Dear Customer,

We have place a loop from the far end 1 hop before Telia towards your side in SG3.

Kindly check on your end and advise us if we can normalize the loop.

Thank you

Can you check and see if the loopback shows all good so we can figure out next steps?

Thanks, no errors there. Please remove the loop and follow up with Telia.

Updated SG3 to remove the loopback and return the circuit to service, sent email reply to Arelion support thread to request next steps since the cross connection tested fine.

Hello Rob,

We’ll open to our 2nd line to dispatch a technician to loop test at our panel.

In future, for the first step please let us know while bi-directional loops are placed by the cross-connect provider so we can check our side facing the loop.

Best Regards ,

Pravin Kularajah

Customer Support Engineer

Hello Team,

Kindly be informed that the Tech has been ordered for tomorrow at 0800 GMT.

Looking forward to keeping you updated.

Thank you

Best Regards,

George Mwakigali

Customer Support Engineer

Looks like there are no more errors. @RobH could you check it one last time before replying to Telia?

cr3-eqsin> show interfaces xe-0/1/1 extensive | match error
[...] # Everything should show 0, especially:
    Bit errors                             0
    Errored blocks                         0
[...]

Indeed, I see no more errors since Arelion investigated earlier this week and since then the errors have cleared up.

This is great that its resolved, but not so great in that no one claimed responsibility. The errors seemed to have cleared up after all the troubleshooting, but it could have been a dirty patch cable at any point and no one (no EQ, not Arelion) have noted or admitted to any failures of hardware in their parts of the circuit.

Overall this is now solved, but @wiki_willy should be aware we'll have an hour or two of trobleshooting on our eqsin bill, and since no one admitted fault we're unlikely to be able to pass those charges to Arelion.