Page MenuHomePhabricator

eqiad: upgrade row C and D uplinks from 4x10G to 1x40G
Closed, ResolvedPublic

Description

With the new eqiad MPC7E linecards (see T304712)

And the work done in T308331: eqiad: move non WMCS servers out of rack D5, we now have the capacity to run a single 40G link from row C and D to the core routers.

This will bring multiple benefits:

  • Improve performance and help improve T291627: Packet Drops on Eqiad ASW -> CR uplinks (right now two large flows being balanced on the same 10G link can cause packet drops)
  • Standardize LACP bundles (right now the 4 links end on multiple linecards and both spines (fpc2/fpc7), more explicit failure scenarios
  • Clean up cabling (remove 12 fibers running across the cage)
  • Clean up switch/router config (less interfaces, no need for bandwidth-threshold knobs)
  • Anticipate link move from old 10G linecards to new 40/100G

The main downside being:

  • Discrepancies between rows A/B and C/D, which is weak compared to the benefits. Those rows are already different anyway as they have less prod racks

Event Timeline

ayounsi triaged this task as Medium priority.
ayounsi mentioned this in Unknown Object (Task).Jul 21 2022, 7:08 AM
ayounsi added a subtask: Unknown Object (Task).
Jclark-ctr closed subtask Unknown Object (Task) as Resolved.Aug 17 2022, 6:51 PM

Row C got moved to the new linecards with no issues, but moving cr1<->row D caused an outage.

As row C cleanup, @Jclark-ctr can you remove the following (now unused) fibers and matching optics, then update Netbox?
1984
3458
3464
2826
2627
3463
2827
3462

Additionally I created the following cables and matching interfaces:
https://netbox.wikimedia.org/dcim/cables/5711/
https://netbox.wikimedia.org/dcim/cables/5712/
https://netbox.wikimedia.org/dcim/cables/5713/
Could you double check them (label, length, color, type, endpoints, etc)?

We will continue with row D in a future window.

Also looks like the optic or fiber needs to be replaced, error rate is high: https://librenms.wikimedia.org/device/device=162/tab=port/port=25733/

As a temporary measure we can fail VRRP over the other cr.

Can this be changed at any time? I will work on netbox updates when not in data center

This opened T314998: Inbound interface errors automatically.

Please sync up with Netops before doing the work as live traffic is using the port.

Mentioned in SAL (#wikimedia-operations) [2022-10-11T15:09:10Z] <XioNoX> disable cr1-eqiad<->asw2-d-eqiad link for re-cabling - T313463

Mentioned in SAL (#wikimedia-operations) [2022-10-11T18:11:26Z] <XioNoX> re-enable cr1-eqiad<->asw2-d-eqiad link for re-cabling - T313463

@Jclark-ctr could you run (and connect and add the optic on the asw side for) this fiber : https://netbox.wikimedia.org/dcim/cables/5899/ cr2-eqiad and asw2-d7-eqiad? Using QSFP+-40G-SR4 optics (and update Netbox)

Similar to https://netbox.wikimedia.org/dcim/cables/5712/ but for row D.

Also those 3 optics mention singlemode, but I think they're multimode, could you double check?
https://netbox.wikimedia.org/dcim/cables/5711/
https://netbox.wikimedia.org/dcim/cables/5712/
https://netbox.wikimedia.org/dcim/cables/5713/

akosiaris subscribed.

Removing SRE, has already been triaged to a more specific SRE subteam

Mentioned in SAL (#wikimedia-operations) [2023-06-14T10:11:05Z] <XioNoX> disable cr1<->row D link for link migration - T313463

Mentioned in SAL (#wikimedia-operations) [2023-06-14T10:40:16Z] <XioNoX> eqiad row D, move VRRP primary to cr1 - T313463

Mentioned in SAL (#wikimedia-operations) [2023-06-14T11:00:49Z] <XioNoX> disable cr2<->row D link for link migration - T313463

Mentioned in SAL (#wikimedia-operations) [2023-06-14T11:11:23Z] <XioNoX> eqiad row D, move VRRP primary back to cr2 - T313463

@ayounsi removed 8 cables. deleted from netbox

When working on something else I noticed that those were still in Netbox:
https://netbox.wikimedia.org/dcim/cables/1594/
https://netbox.wikimedia.org/dcim/cables/1598/
https://netbox.wikimedia.org/dcim/cables/1612/
https://netbox.wikimedia.org/dcim/cables/1616/

Could you double check if they're still connected and disconnect them or remove them from Netbox ?

disconnected and removed from netbox