Page MenuHomePhabricator

eqiad: upgrade row C and D uplinks from 4x10G to 1x40G
Closed, ResolvedPublic

Description

With the new eqiad MPC7E linecards (see T304712)

And the work done in T308331: eqiad: move non WMCS servers out of rack D5, we now have the capacity to run a single 40G link from row C and D to the core routers.

This will bring multiple benefits:

  • Improve performance and help improve T291627: Packet Drops on Eqiad ASW -> CR uplinks (right now two large flows being balanced on the same 10G link can cause packet drops)
  • Standardize LACP bundles (right now the 4 links end on multiple linecards and both spines (fpc2/fpc7), more explicit failure scenarios
  • Clean up cabling (remove 12 fibers running across the cage)
  • Clean up switch/router config (less interfaces, no need for bandwidth-threshold knobs)
  • Anticipate link move from old 10G linecards to new 40/100G

The main downside being:

  • Discrepancies between rows A/B and C/D, which is weak compared to the benefits. Those rows are already different anyway as they have less prod racks

Event Timeline

ayounsi triaged this task as Medium priority.Jul 21 2022, 6:39 AM
ayounsi created this task.
ayounsi mentioned this in Unknown Object (Task).Jul 21 2022, 7:08 AM
ayounsi added a subtask: Unknown Object (Task).
Jclark-ctr closed subtask Unknown Object (Task) as Resolved.Aug 17 2022, 6:51 PM

Row C got moved to the new linecards with no issues, but moving cr1<->row D caused an outage.

As row C cleanup, @Jclark-ctr can you remove the following (now unused) fibers and matching optics, then update Netbox?
1984
3458
3464
2826
2627
3463
2827
3462

Additionally I created the following cables and matching interfaces:
https://netbox.wikimedia.org/dcim/cables/5711/
https://netbox.wikimedia.org/dcim/cables/5712/
https://netbox.wikimedia.org/dcim/cables/5713/
Could you double check them (label, length, color, type, endpoints, etc)?

We will continue with row D in a future window.

Also looks like the optic or fiber needs to be replaced, error rate is high: https://librenms.wikimedia.org/device/device=162/tab=port/port=25733/

As a temporary measure we can fail VRRP over the other cr.

Can this be changed at any time? I will work on netbox updates when not in data center

This opened T314998: Inbound interface errors automatically.

Please sync up with Netops before doing the work as live traffic is using the port.

Mentioned in SAL (#wikimedia-operations) [2022-10-11T15:09:10Z] <XioNoX> disable cr1-eqiad<->asw2-d-eqiad link for re-cabling - T313463

Mentioned in SAL (#wikimedia-operations) [2022-10-11T18:11:26Z] <XioNoX> re-enable cr1-eqiad<->asw2-d-eqiad link for re-cabling - T313463

@Jclark-ctr could you run (and connect and add the optic on the asw side for) this fiber : https://netbox.wikimedia.org/dcim/cables/5899/ cr2-eqiad and asw2-d7-eqiad? Using QSFP+-40G-SR4 optics (and update Netbox)

Similar to https://netbox.wikimedia.org/dcim/cables/5712/ but for row D.

Also those 3 optics mention singlemode, but I think they're multimode, could you double check?
https://netbox.wikimedia.org/dcim/cables/5711/
https://netbox.wikimedia.org/dcim/cables/5712/
https://netbox.wikimedia.org/dcim/cables/5713/

akosiaris subscribed.

Removing SRE, has already been triaged to a more specific SRE subteam

Mentioned in SAL (#wikimedia-operations) [2023-06-14T10:11:05Z] <XioNoX> disable cr1<->row D link for link migration - T313463

Mentioned in SAL (#wikimedia-operations) [2023-06-14T10:40:16Z] <XioNoX> eqiad row D, move VRRP primary to cr1 - T313463

Mentioned in SAL (#wikimedia-operations) [2023-06-14T11:00:49Z] <XioNoX> disable cr2<->row D link for link migration - T313463

Mentioned in SAL (#wikimedia-operations) [2023-06-14T11:11:23Z] <XioNoX> eqiad row D, move VRRP primary back to cr2 - T313463

@ayounsi removed 8 cables. deleted from netbox

When working on something else I noticed that those were still in Netbox:
https://netbox.wikimedia.org/dcim/cables/1594/
https://netbox.wikimedia.org/dcim/cables/1598/
https://netbox.wikimedia.org/dcim/cables/1612/
https://netbox.wikimedia.org/dcim/cables/1616/

Could you double check if they're still connected and disconnect them or remove them from Netbox ?

disconnected and removed from netbox