Page MenuHomePhabricator

eqiad: Move links to new MPC7E linecard
Closed, ResolvedPublic

Description

Similar to T289241, but for eqiad. Opening it as a placeholder for now, to be filled up before doing the actual move.

  • You can configure different combination of port speeds as long as the aggregate capacity per group of six ports labeled 0/0 through 0/5 does not exceed 240 Gbps. Similarly, aggregate capacity per group of the other six ports labeled 1/0 through 1/5 should not exceed 240 Gbps.
  • Four out of the twelve ports can be configured as 100-Gigabit Ethernet ports. Port numbers 0/2, 0/5, 1/2 and 1/5 are the four 100-Gigabit Ethernet ports (in bold)

cr1-eqiad

DonePortConnectorZ sideUsed portsUsed capacityNote
0/0QSFPP-4X10GE-LR0/40G
0/1QSFPP-4X10GE-LRcr2-eqiad:xe-4/3/01/410G
0/2QSFP-100G-CWDM4lsw1-e11/1100G
0/3QSFPP-4X10GE-LR0/40G
0/4QSFPP-4X10GE-LR0/40G
0/5
1/0QSFPP-40G-SRasw2-c-eqiad:et-2/0/531/140G
1/1QSFPP-4X10GE-LR0/40G
1/2QSFP-100G-CWDM4Reserved for future link to codfw1/1100G
1/3QSFPP-40G-SRasw2-d-eqiad:et-2/0/491/140G
1/4QSFPP-4X10GE-LR0/40G
1/5

Total capacity group 0: 110
Total capacity group 1: 180

cr1-eqiad from FPC4 to FPC3

DoneFPC4Z sideFPC3Note
xe-4/0/0asw2-a-eqiad:xe-7/0/44 {#1985}xe-3/0/2
xe-4/0/1asw2-b-eqiad:xe-2/0/45 {#3457}xe-3/2/3
xe-4/1/0asw2-a-eqiad:xe-7/0/45{#3454}xe-3/1/2
xe-4/1/1asw2-b-eqiad:xe-7/0/44 {#3459}xe-3/3/3
xe-4/2/0Arelion, IC-307235, 34ms 10Gbps wave) {#5226}xe-3/2/2
xe-4/2/2GTT (680970, 10Gbps VPLS) {#3466}xe-3/0/7
xe-4/3/1Peering: Hurricane Electric (N/A) {#3909}xe-3/1/5
xe-4/3/2Transit: NTT {#3475}xe-3/1/6

cr2-eqiad

DonePortConnectorZ sideUsed portsUsed capacityNote
0/0QSFPP-4X10GE-LR0/40G
0/1QSFPP-4X10GE-LRcr1-eqiad-xe-4/3/0-;Transport:cr2-eqord:xe-0/1/52/420G
0/2QSFP-100G-CWDM4lsw1-f11/1100G
0/3QSFPP-4X10GE-LR0/40G
0/4QSFPP-4X10GE-LR0/40G
0/5
1/0QSFPP-40G-SRasw2-c-eqiad:et-7/0/531/140G
1/1QSFPP-4X10GE-LR0/40G
1/2
1/3QSFPP-40G-SRasw2-d-eqiad:et-7/0/531/140G
1/4QSFPP-4X10GE-LR0/40G
1/5

Total capacity group 0: 120
Total capacity group 1: 80

cr2-eqiad from FPC4 to FPC3

DoneFPC4Z sideFPC3Note
xe-4/0/0asw2-a-eqiad:xe-7/0/46 {#3455}xe-3/0/2
xe-4/0/1asw2-b-eqiad:xe-2/0/47 {#3460}xe-3/2/3
xe-4/1/0asw2-a-eqiad:xe-7/0/47{#1986}xe-3/1/2
xe-4/1/1asw2-b-eqiad:xe-7/0/47 {#3461}xe-3/3/1
xe-4/1/3Transport: cr2-esams:xe-0/1/3{#2013}xe-3/2/1
xe-4/3/1Transit: Telia {#3861}xe-3/3/2

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Papaul updated the task description. (Show Details)

@ayounsi I did all the switches up-link to both core routers, please double check and see if all looks good.
Thanks

Checking LLDP it looks all good to me.
I'd have preferred that the links are not crossed between FPC and CR, for example that all the links on FPC2 go to cr1, but now is not a good time to change that.

Transit/transport links need special care to maintain redundancy in case and optic dies.

@ayounsi I update the table with transit/transport links. Please double check.
For cr1 to cr2 I have a total of 3 links 2 on FPC3 and 1 on FPC4. My guess is the link on FPC4 is there in case FPC3 goes bad. So my question is are we keeping the link on FPC4?

Thanks!

I like your idea of putting the capacity in the table, I added dedicated columns for it.

Note that I don't know if there is enough total ports/capacity/diversity. So don't be surprised when you run the numbers :)
If there is enough could you let us know how much free ports/capacity that leaves us with?
If there is not enough, which limit do we hit first?

  • FPC4 will need to be decom (it's 10yo, and seeing the traffic level on ae0, a 3x10G LAG is preferred but we can temporarily keep one of them on FPC4
  • Some links seem missing:
    • The Equinix IX port from cr1-eqiad,
    • links to cloudsw
    • links to pfw
    • link to cloudflare, even though we're not using it yet so we can put that one on hold for now
  • We need to keep 100G capacity on cr1 for the future link to codfw (which will free up 1x10G port Telia transport) so feel free to add it in the table - T293091
  • Similarly we will need 1x10G extra on cr2, on the same interface as the Lumen link to keep things simpler, you can account for it in the table as well - T293091
  • The two cr2-eqiad Telia transit ports can go on the same interface as well for simplicity (no need for PIC diversity) - https://netbox.wikimedia.org/dcim/interfaces/9076/
ayounsi updated the task description. (Show Details)
ayounsi updated the task description. (Show Details)
Papaul updated the task description. (Show Details)

Thanks!

I like your idea of putting the capacity in the table, I added dedicated columns for it.

Note that I don't know if there is enough total ports/capacity/diversity. So don't be surprised when you run the numbers :)
If there is enough could you let us know how much free ports/capacity that leaves us with?
If there is not enough, which limit do we hit first?

  • FPC4 will need to be decom (it's 10yo, and seeing the traffic level on ae0, a 3x10G LAG is preferred but we can temporarily keep one of them on FPC4
  • Some links seem missing:
    • The Equinix IX port from cr1-eqiad,
    • links to cloudsw
    • links to pfw
    • link to cloudflare, even though we're not using it yet so we can put that one on hold for now
  • We need to keep 100G capacity on cr1 for the future link to codfw (which will free up 1x10G port Telia transport) so feel free to add it in the table - T293091 Done
  • Similarly we will need 1x10G extra on cr2, on the same interface as the Lumen link to keep things simpler, you can account for it in the table as well - T293091 Done
  • The two cr2-eqiad Telia transit ports can go on the same interface as well for simplicity (no need for PIC diversity) - https://netbox.wikimedia.org/dcim/interfaces/9076/ Done

@ayounsi we are at full capacity on both groups for cr1 and we have 2 links that we still need to connect
1- link to cr2-eqiad xe-4/3/0
2 - link to Hurricane Electric xe-4/3/1
on the other side cr2-eqiad still have available for group 0 10g and for group 1 100G. The only link left to connect on cr2-eqiad is the link to cr1-eqaid (xe-4/3/0)

Thanks!
Working on next FY budget, I noticed that FPC3 (MPC4E-3D-32XGE-SFPP, 2016) still have a few years before being due for a refresh. We do have to migrate links away from FPC4 though (2012).

Trying to avoid a scope creep, using that opportunity to upgrade to 40G uplinks where possible: row C anytime, and row D when T308331 is completed will make the move easier and reduce the risk of link saturation.

I realize it means re-shuffling the tedious work you already did.

Looking at cr1 it should looks like this:

xe-3/0/0 - Core: asw2-a-eqiad:xe-2/0/44 {#4776} --- Leave as it
xe-3/0/1 - Core: asw2-b-eqiad:xe-2/0/44 {#1989} --- Leave as it
xe-3/0/2 - Core: asw2-c-eqiad:xe-2/0/44 {#1984}   --- move to FPC1 - 40G
xe-3/0/3 - Core: asw2-d-eqiad:xe-2/0/40 {#3872}   --- move to FPC1 - 40G
xe-3/0/4 - Core: cloudsw1-c8-eqiad:xe-0/0/0 {#5263} --- Leave as it
xe-3/0/5 - Peering: Cloudflare {#8434} --- Leave as it
xe-3/0/6 - Peering: Equinix {#2009} --- Leave as it
xe-3/1/0 - Core: asw2-a-eqiad:xe-2/0/45 {#3452}   --- Leave as it
xe-3/1/1 - Core: asw2-b-eqiad:xe-7/0/45 {#1991}   --- Leave as it
xe-3/1/2 - Core: asw2-c-eqiad:xe-2/0/45 {#3458}   --- move to FPC1 - 40G
xe-3/1/3 - Core: asw2-d-eqiad:xe-2/0/41 {#3898}   --- move to FPC1 - 40G
xe-3/1/4 - Transport: cr1-drmrs:xe-0/1/2 (Telxius, 10G Wave) {#3482} --- Leave as it
xe-3/1/7 - Core: pfw3-eqiad:xe-0/0/16 {#4026} --- Leave as it
xe-3/2/0 - Core: cr2-eqiad:xe-3/2/0 {#1983} --- Leave as it
xe-3/2/1 - Peering: Facebook (FC-5205147) {#2648} --- Leave as it
xe-3/3/0 - Core: cr2-eqiad:xe-3/3/0 {#2651} --- Leave as it
xe-3/3/2 - Transit: Lumen (442550281) {#3867} --- Leave as it
------------------------
xe-4/0/0 - Core: asw2-a-eqiad:xe-7/0/44 {#1985}   --- move to FPC3
xe-4/0/1 - Core: asw2-b-eqiad:xe-2/0/45 {#3457}   --- move to FPC3
xe-4/0/2 - Core: asw2-c-eqiad:xe-7/0/44 {#2627}   --- move to FPC1 - 40G
xe-4/0/3 - Core: asw2-d-eqiad:xe-7/0/40 {#3465}   --- move to FPC1 - 40G
xe-4/1/0 - Core: asw2-a-eqiad:xe-7/0/45 {#3454}   --- move to FPC3
xe-4/1/1 - Core: asw2-b-eqiad:xe-7/0/44 {#3459}   --- move to FPC3
xe-4/1/2 - Core: asw2-c-eqiad:xe-7/0/45 {#3463}   --- move to FPC1 - 40G
xe-4/3/3 - Core: asw2-d-eqiad:xe-7/0/41 {#3537}   --- move to FPC1 - 40G
xe-4/2/0 - Transport: cr1-codfw:xe-1/1/1:3 (Telia,10Gbps wave) {#5226}  --- move to FPC1 100G
xe-4/2/2 - Transport: GTT (680970, 10Gbps VPLS) {#3466}  --- Move to FPC3
xe-4/3/0 - Core: cr2-eqiad:xe-4/3/0 {#3456} --- Move to FPC1 10G breakout (1 or 2x10G)
xe-4/3/1 - Peering: Hurricane Electric (N/A) {#3909} --- Move to FPC3
xe-4/3/2 - Transit: NTT  {#3475} --- Move to FPC3

@Papaul, what do you think of this new, easier (and hopefully final) plan?

@ayounsi looks good to me. I will go ahead and redo the table for cr1 for now.

@Papaul, nice!
We should keep all the same switch's uplinks on the same breakout cable:
So instead of doing:
0/0 - asw2-c-eqiad:xe-2/0/[44-45]- asw2-d-eqiad:xe-2/0/[40-41]
0/1 - asw2-c-eqiad:xe-7/0/[44-45]-asw2-d-eqiad:xe-7/0/[40-41]

I recommend:
0/0 - asw2-c-eqiad:xe-2/0/[44-45] - asw2-c-eqiad:xe-7/0/[44-45]
0/1- asw2-d-eqiad:xe-2/0/[40-41] - asw2-d-eqiad:xe-7/0/[40-41]

This will help move to a 40G SFP later down the road, and we still have the redundancy between cr1 and cr2.

Other than that it looks good to me.

Let me know if you prefer to complete cr1 fully, (get the optics, do the recabling) and only then tackle cr2. Or plan cr2 first. Both work for me.

@ayounsi yes i can complete cr1 but of course with your help.

Thanks

Great, so next step are:

  1. Install the breakout panels, (document them, similar to T304710: Document codfw breakout patch panels in Netbox)
  2. Pre-populate the ports/panels that will be used with the matching optics
  3. [MPC7E] Configure the port speed, then bounce the PICs (this might be disruptive with the link to cages E/F)
  4. Move the ports, probably starting with FPC3 (once T312745: cr2-eqiad:FPC3 partial failure (PIC2/3) is solved), then the FPC7E

First the fpc4->fpc3 move as it will be the easiest

@ayounsi I moved asw2-c and asw2-d uplink from 0/0 and 0/1 to 1/0 and 1/3 on both router to match codfw. In the future if we have to change row A and row B from 4x10G to 1x40 we can use 0/0 for row A and 0/3 for row B so it matches codfw.

@Jclark-ctr when you have time, can you plug on :

CR1-eqiad to front of the patch panel

port 0/1: 1 breakout cable and the first break out cable goes the 1st fiber cassette to port 9/10 on the patch panel(you can connect the other breakout cables to port (11/12,13/14 and 15/16) but we are going to use only the one connected to port 9/10 for now

port 1/0: 1 breakout cable and the first break out cable goes the 3rd fiber cassette to port 1/2 on the patch panel
port 1/0: 1 breakout cable and the second break out cable goes the 3rd fiber cassette to port 3/4 on the patch panel
port 1/0: 1 breakout cable and the third break out cable goes the 3rd fiber cassette to port 5/6 on the patch panel
port 1/0: 1 breakout cable and the fourth break out cable goes the 3rd fiber cassette to port 7/8 on the patch panel

port 1/3: 1 breakout cable and the first break out cable goes the 4th fiber cassette to port 1/2 on the patch panel
port 1/3: 1 breakout cable and the second break out cable goes the 3rd fiber cassette to port 3/4 on the patch panel
port 1/3: 1 breakout cable and the third break out cable goes the 3rd fiber cassette to port 5/6 on the patch panel
port 1/3: 1 breakout cable and the fourth break out cable goes the 3rd fiber cassette to port 7/8 on the patch panel

CR2-eqiad to front of the patch panel

port 0/1: 1 breakout cable and the first break out cable goes the 1st fiber cassette to port 9/10 on the patch panel
port 0/1: 1 breakout cable and the second break out cable goes the 1st fiber cassette to port 11/12 on the patch panel
(you can connect the other breakout cables to port (13/14 and 15/16) but will are going to use only the 2 first cables connected to port 9/10 and 11/12 for now

port 1/0: 1 breakout cable and the first break out cable goes the 3rd fiber cassette to port 1/2 on the patch panel
port 1/0: 1 breakout cable and the second break out cable goes the 3rd fiber cassette to port 3/4 on the patch panel
port 1/0: 1 breakout cable and the third break out cable goes the 3rd fiber cassette to port 5/6 on the patch panel
port 1/0: 1 breakout cable and the fourth break out cable goes the 3rd fiber cassette to port 7/8 on the patch panel

port 1/3: 1 breakout cable and the first break out cable goes the 4th fiber cassette to port 1/2 on the patch panel
port 1/3: 1 breakout cable and the second break out cable goes the 3rd fiber cassette to port 3/4 on the patch panel
port 1/3: 1 breakout cable and the third break out cable goes the 3rd fiber cassette to port 5/6 on the patch panel
port 1/3: 1 breakout cable and the fourth break out cable goes the 3rd fiber cassette to port 7/8 on the patch panel

How to label the break out cable?

here is an example 10905_12273-1
12273 is the cable id on the cr* side
10905 is the cable id of the fiber coming in at the rear of the patch panel
1 is the first breakout break so for the second breakout we will have 12273-2 and third 12273-3 and so on

cr*side----12273-----------first_breakout-cable patch panel front-----10905_12273-1----rear of the patch panel-----10905
cr*side----12273-----------second_breakout-cable patch panel front-----10906_12273-2----rear of the patch panel-----10906
cr*side----12273-----------fthird_breakout-cable patch panel front-----10907_12273-3----rear of the patch panel-----10907
cr*side----12273-----------fourth_breakout-cable patch panel front-----10908_12273-4----rear of the patch panel-----10908

once you create the cable id for the cr* side you can just update the task with the cable id.
To find the cable-id for the rear side of the patch panel, just look at the table in the description to see what will be connected to the rear side of the patch panel. IF you have any question let me know

Thanks

on cr1-eqaid, we have all the interfaces setup for asw2-c and asw2-d move

papaul@re0.cr1-eqiad> show interfaces terse | match xe-1/1
xe-1/1/0:0              down  down
xe-1/1/0:1              down  down
xe-1/1/0:2              down  down
xe-1/1/0:3              down  down
xe-1/1/3:0              down  down
xe-1/1/3:1              down  down
xe-1/1/3:2              down  down
xe-1/1/3:3              down  down

on cr2 interface setup complete

papaul@re0.cr2-eqiad# run show interfaces terse | match xe-1/1/*
xe-1/1/0:0              down  down
xe-1/1/0:1              down  down
xe-1/1/0:2              down  down
xe-1/1/0:3              down  down
xe-1/1/3:0              down  down
xe-1/1/3:1              down  down
xe-1/1/3:2              down  down
xe-1/1/3:3              down  down

@ayounsi everything is ready on the routers to start moving the links. Sorry i am late on this had to finished with the PDU's maintenance.

Nice! let me know when we're ready to do the move and when you would like to do it.

I asked @Jclark-ctr to run the 40G fiber for row C and row D and he said he will get it done sometimes next week. Once the fiber in place I will update you.

@Jclark-ctr for more information: the MMF/MTP fibers ordered in https://phabricator.wikimedia.org/T313464 we want

1 fiber from rack c2 to rack a1
1 fiber from rack c7 to rack a8

1 fiber from rack d2 to rack a1
1 fiber from rack d7 to rack a8

Thanks

Cables have been run between racks
c2 <-- G2204190495000069 --> a1
c7 <-- G2204190495000136 --> a8

d2 <-- G2204190495000072 --> a1
d7 <-- G2204190495000097 --> a8

The 40G ports in cr1-eqiad and cr2-eqiad to connect asw-c2/c7 and asw-d2/d7 are ready

papaul@re0.cr1-eqiad> show interfaces terse | match et-1/1/
et-1/1/0                down  down
et-1/1/3                down  down


papaul@re0.cr2-eqiad> show interfaces terse | match et-1/1/*
et-1/1/0                down  down
et-1/1/3                down  down

@Jclark-ctr @Cmjohnson I am planning on moving all the links on cr[1-2]-eqaid from fpc4 to fpc3 for the once in both cr1-eqiad from FPC4 to FPC3 and cr2-eqiad from FPC4 to FPC3 tables next week on Monday 17th at 15:00 UTC. Before I send out the notification email, I wanted to check first with you if the date and time works for you.

Thanks.

Change 843513 had a related patch set uploaded (by Ayounsi; author: Ayounsi):

[operations/homer/public@master] cr1-eqiad: rename GTT interface

https://gerrit.wikimedia.org/r/843513

Change 843513 merged by jenkins-bot:

[operations/homer/public@master] cr1-eqiad: rename GTT interface

https://gerrit.wikimedia.org/r/843513

Papaul updated the task description. (Show Details)

@cmooney hey I was about to set up sub-ports on fpc1 pic0 on both cr1-eqiad and cr2-eqiad and realized that lsw1-[e1-f1] are connected to pic0 of fpc1 on both routers. if i set up the sub-ports i will have to bounce pic0 . i know if i do it on cr1 first then cr2 the traffic from nodes connected to all the leaves switch to lsw1-e1 will switch to lsw1-f1 and if i do cr2 it will be the other way around. Just wanted check with you first.

Thanks

@Papaul hey. I think it can be done in any order. Probably best to hard down the port first to be safe (which will cause the CR to down the BGP session actively in advance) i.e.

set interfaces et-1/0/2 disable

Then set up the sub-ports and bounce pic0. When done delete the disable, check BGP comes back up, and then dot he other one. Any questions let me know. Thanks.

ayounsi mentioned this in Unknown Object (Task).Oct 25 2022, 1:09 PM

sub-ports are ready on cr[1-2]-eqiad

papaul@re0.cr1-eqiad# run show interfaces terse | match xe-1/0/1*
xe-1/0/1:0              down  down
xe-1/0/1:1              down  down
xe-1/0/1:2              down  down
xe-1/0/1:3              down  down

papaul@re0.cr2-eqiad# run show interfaces terse | match xe-1/0/*
xe-1/0/1:0              down  down
xe-1/0/1:1              down  down
xe-1/0/1:2              down  down
xe-1/0/1:3              down  down

Summarizing, what's left on FPC4 in term of physical interfaces, leaving asw2-d-eqiad aside for now, as we're tackling them in T313463:

cr1-eqiad:
xe-4/3/0 up up Core: cr2-eqiad:xe-4/3/0 {#3456}

cr2-eqiad:
xe-4/2/0 up up Transport: cr2-eqord:xe-0/1/5 (Arelion, IC-314533, 24ms 10Gbps wave) {#3658}
xe-4/3/0 up up Core: cr1-eqiad:xe-4/3/0 {#3456}

Once that's over we can focus on the GRE tunnels.

@Jclark-ctr are you available this Monday or Tuesday to move them?

Mentioned in SAL (#wikimedia-operations) [2023-01-16T13:35:13Z] <XioNoX> disable one of 3 cr1-cr2 eqiad links - T304712

Change 880478 had a related patch set uploaded (by Ayounsi; author: Ayounsi):

[operations/homer/public@master] eqiad: move GRE tunnels out of FPC4

https://gerrit.wikimedia.org/r/880478

Change 880478 merged by Ayounsi:

[operations/homer/public@master] eqiad: move links out of FPC4

https://gerrit.wikimedia.org/r/880478

Thanks John and Papaul, as soon as Netbox is updated this can be closed!