Page MenuHomePhabricator

Q2:(Need By: TBD) Rows E/F network racking task
Closed, ResolvedPublic

Description

This task will track the racking, setup, and dc ops deployment of the new switches for rows E & F in the new cage/expansion.

This is not a typical racking task, due to the nature of this deployment. While this task initially has info for both rows E and F, row F may be moved to another task later as its expansion/buildout is after row E.

Hostnames

Hostnames will need to be reviewed by netops, as we've not used dedicated spine switches in the past. The current assumption will be 'spine-rack-eqiad' once the location of the spine switches are decided on T291485#7375305

Cabling Plan / Diagrams

netops to provided detailed directions in this section on the cabling diagram/plan for the routing of cables and uplinks between racks. This cannot be fully provided until a decision is made on spine switch placements.

Per host setup checklist

The below checklist has all the steps required for each switch to be brought online by DC-Ops for netops configuration and deployment.

ssw1-e1-eqiad (QFX5120-32C-AFI):

  • - receive in system on procurement task T287591 & in coupa
  • - rack into assigned rack, spine switch rack and hostnames will be determined later. Example: If it goes into E2 and E7, then it would be spine-e2-eqiad and spine-e7-eqiad.)
  • - label and power up switch
  • - connect new switch to a port on the scs
  • - test scs connection and verify connectivity
  • - see Cabling Plan section above for how to connect to rest of the row, connect as advised

ssw1-f1-eqiad (QFX5120-32C-AFI):

  • - receive in system on procurement task T287591 & in coupa
  • - rack into assigned rack, spine switch rack and hostnames will be determined later. Example: If it goes into E2 and E7, then it would be spine-e2-eqiad and spine-e7-eqiad.)
  • - label and power up switch
  • - connect new switch to a port on the scs
  • - test scs connection and verify connectivity
  • - see Cabling Plan section above for how to connect to rest of the row, connect as advised

lsw-e1-eqiad (QFX5120-48Y-AFI):

  • - receive in system on procurement task T287591 & in coupa
  • - rack into assigned rack (see hostname of switch for non spines) and enter switch into netbox with all associated information
  • - label and power up switch
  • - connect new switch to a port on the scs
  • - test scs connection and verify connectivity
  • - see Cabling Plan section above for how to connect to rest of the row, connect as advised

lsw-e2-eqiad (QFX5120-48Y-AFI):

  • - receive in system on procurement task T287591 & in coupa
  • - rack into assigned rack (see hostname of switch for non spines) and enter switch into netbox with all associated information
  • - label and power up switch
  • - connect new switch to a port on the scs
  • - test scs connection and verify connectivity
  • - see Cabling Plan section above for how to connect to rest of the row, connect as advised

lsw-e3-eqiad (QFX5120-48Y-AFI):

  • - receive in system on procurement task T287591 & in coupa
  • - rack into assigned rack (see hostname of switch for non spines) and enter switch into netbox with all associated information
  • - label and power up switch
  • - connect new switch to a port on the scs
  • - test scs connection and verify connectivity
  • - see Cabling Plan section above for how to connect to rest of the row, connect as advised

cloudsw1-e4-eqiad (QFX5120-48Y-AFI):

  • - receive in system on procurement task T287591 & in coupa
  • - rack into assigned rack (see hostname of switch for non spines) and enter switch into netbox with all associated information
  • - label and power up switch
  • - connect new switch to a port on the scs
  • - test scs connection and verify connectivity
  • - see Cabling Plan section above for how to connect to rest of the row, connect as advised

lsw-f1-eqiad (QFX5120-48Y-AFI):

  • - receive in system on procurement task T287591 & in coupa
  • - rack into assigned rack (see hostname of switch for non spines) and enter switch into netbox with all associated information
  • - label and power up switch
  • - connect new switch to a port on the scs
  • - test scs connection and verify connectivity
  • - see Cabling Plan section above for how to connect to rest of the row, connect as advised

lsw-f2-eqiad (QFX5120-48Y-AFI):

  • - receive in system on procurement task T287591 & in coupa
  • - rack into assigned rack (see hostname of switch for non spines) and enter switch into netbox with all associated information
  • - label and power up switch
  • - connect new switch to a port on the scs
  • - test scs connection and verify connectivity
  • - see Cabling Plan section above for how to connect to rest of the row, connect as advised

lsw-f3-eqiad (QFX5120-48Y-AFI):

  • - receive in system on procurement task T287591 & in coupa
  • - rack into assigned rack (see hostname of switch for non spines) and enter switch into netbox with all associated information
  • - label and power up switch
  • - connect new switch to a port on the scs
  • - test scs connection and verify connectivity
  • - see Cabling Plan section above for how to connect to rest of the row, connect as advised

cloudsw1-f4-eqiad (QFX5120-48Y-AFI):

  • - receive in system on procurement task T287591 & in coupa
  • - rack into assigned rack (see hostname of switch for non spines) and enter switch into netbox with all associated information
  • - label and power up switch
  • - connect new switch to a port on the scs
  • - test scs connection and verify connectivity
  • - see Cabling Plan section above for how to connect to rest of the row, connect as advised

TODO: duplicate the above checklist for row F, or create a new task for Row F. Do not resolve this task without doing one of these.

Event Timeline

RobH added a parent task: Unknown Object (Task).
RobH moved this task from Backlog to Racking Tasks on the ops-eqiad board.

Hey Guys,

The cabling plan for the switch->switch cabling in the new Eqiad cage should be as follows:

LSW1-E1 Links:
    LSW1-E1 et-0/0/48 - CR1

    LSW1-E1 et-0/0/49 - LSW1-E2 et-0/0/54
    LSW1-E1 et-0/0/50 - LSW1-E3 et-0/0/54
    LSW1-E1 et-0/0/51 - LSW1-E4 et-0/0/54

    LSW1-E1 et-0/0/52 - LSW1-F1 et-0/0/52

    LSW1-E1 et-0/0/53 - LSW1-F2 et-0/0/54
    LSW1-E1 et-0/0/54 - LSW1-F3 et-0/0/54
    LSW1-E1 et-0/0/55 - LSW1-F4 et-0/0/54
LSW1-F1 Links:
    LSW1-F1 et-0/0/48 - CR2

    LSW1-F1 et-0/0/49 - LSW1-E2 et-0/0/55
    LSW1-F1 et-0/0/50 - LSW1-E3 et-0/0/55
    LSW1-F1 et-0/0/51 - LSW1-E4 et-0/0/55

    LSW1-F1 et-0/0/52 - LSW1-E1 et-0/0/52

    LSW1-F1 et-0/0/53 - LSW1-F2 et-0/0/55
    LSW1-F1 et-0/0/54 - LSW1-F3 et-0/0/55
    LSW1-F1 et-0/0/55 - LSW1-F4 et-0/0/55

Please note naming convention we have decided to change based on the new physical/logical setup. LSW = "Leaf Swtich" (instead of "ASW"), for the QFX5120-48Y devices.

All connections are 100GBase-CWDM4 with LC-LC single-mode fiber connections. Only exception is the CR links which will be LC-SC to land on the patch panel. We need to confirm the exact ports for those for now, and a final decision on T293221 is needed too. But the racking and switch->switch cabling in the new cage can proceed based on this plan.

Be aware this cabling is temporary as we do not yet have the QFX5120-32C devices to act as Spines (which we will call "SSW"). So this plan involves using the Leaf devices in E1/F1 as temporary spines, which will need to be changed when the real spine hardware arrives. Please bear in mind that those Spines will go into rack E1/F1 when they arrive so best to pre-allocate the space for them now.

Diagram here:

Ping me if there are any questions.

RobH added parent tasks: Unknown Object (Task), Unknown Object (Task).Nov 24 2021, 11:28 PM
RobH added a subtask: Unknown Object (Task).Nov 24 2021, 11:57 PM
RobH removed a subtask: Unknown Object (Task).
RobH added a parent task: Unknown Object (Task).

Only exception is the CR links which will be LC-SC to land on the patch panel.

I should clarify that if we pre-cable the patch panel ports to fibre panels in our own racks, and those panels we install have LC connectors on them, then the above quote does not apply. In that scenario the link runs from switch --> "our_installed_panel" and will be LC-LC.

@Jclark-ctr I think we can assign the CR ports like this:

cr1-eqiad et-1/0/2  ---->  lsw1-e1-eqiad  et-0/0/48
cr2-eqiad et-1/0/2  ---->  lsw1-f1-eqiad  et-0/0/48

cr1-eqiad et-1/0/2 ----> lsw1-e1-eqiad et-0/0/48 connected using patch panel number 2190001 and cable ID's lsw1-demarc (new cage) 011820203 and demarc (old cage) to cr1-eqiad 011820201

cr2-eqiad et-1/0/2 ----> lsw1-f1-eqiad et-0/0/48 connected using patch panel number 2190002 and cable ID's lsw1-demarc (new cage) 011820204 and demarc (old cage) to cr1-eqiad 011820202

@Cmjohnson thanks!

Unfortunately seems to be some kind of problem. Neither are showing up.

cr1-eqiad et-1/0/2 ----> lsw1-e1-eqiad et-0/0/48

Not seeing any light ingress on either side. cr1-eqiad side:

cmooney@re0.cr1-eqiad> show interfaces et-1/0/2 
Physical interface: et-1/0/2, Enabled, Physical link is Down
cmooney@re0.cr1-eqiad> show interfaces diagnostics optics et-1/0/2 | match "Laser receiver" | except "low|high" 
    Laser receiver power                      :  0.000 mW / -40.04 dBm
    Laser receiver power                      :  0.000 mW / -40.04 dBm
    Laser receiver power                      :  0.000 mW / -40.04 dBm
    Laser receiver power                      :  0.000 mW / -40.04 dBm

lsw1-e1-eqiad side:

cmooney@lsw1-e1-eqiad> show interfaces et-0/0/48    
Physical interface: et-0/0/48, Enabled, Physical link is Down
cmooney@lsw1-e1-eqiad> show interfaces diagnostics optics et-0/0/48 | match "Laser receiver" | except "low|high"   
    Laser receiver power                      :  0.000 mW / -40.04 dBm
    Laser receiver power                      :  0.000 mW / -40.04 dBm
    Laser receiver power                      :  0.000 mW / -40.04 dBm
    Laser receiver power                      :  0.000 mW / -40.04 dBm

cr2-eqiad et-1/0/2 ----> lsw1-f1-eqiad et-0/0/48

Not seeing any light ingress on either side on this one either. cr2-eqiad side:

cmooney@re0.cr2-eqiad> show interfaces et-1/0/2                                                                    
Physical interface: et-1/0/2, Enabled, Physical link is Down
cmooney@re0.cr2-eqiad> show interfaces diagnostics optics et-1/0/2 | match "Laser receiver" | except "low|high" 
    Laser receiver power                      :  0.000 mW / -40.04 dBm
    Laser receiver power                      :  0.000 mW / -40.04 dBm
    Laser receiver power                      :  0.000 mW / -40.04 dBm
    Laser receiver power                      :  0.000 mW / -40.04 dBm

lsw1-f1-eqiad side:

root@lsw1-f1-eqiad> show interfaces et-0/0/48 
Physical interface: et-0/0/48, Enabled, Physical link is Down
root@lsw1-f1-eqiad> ...gnostics optics et-0/0/48 | match "Laser receiver" |   
    Laser receiver power                      :  0.000 mW / -40.04 dBm
    Laser receiver power                      :  0.000 mW / -40.04 dBm
    Laser receiver power                      :  0.000 mW / -40.04 dBm
    Laser receiver power                      :  0.000 mW / -40.04 dBm

Can you check the patches? Potentially just the two strands of the fiber got reversed between one side and the other, so swapping them around might fix it.

@cmooney I went through all the cabling and confirmed the correct patches. the connections at the demarc are pretty foolproof with the A and B side labeled and the fiber has a plastic holder so you cannot mix it up. Is the port on CR1-and CR2 enabled?

@cmooney it appears to be disabled

cmjohnson@re0.cr1-eqiad> show interfaces descriptions
Interface Admin Link Description
et-1/0/2 down down DISABLED

@Cmjohnson thanks.

The interfaces on the CR are down by default. Not sure if you changed anything but there is no improvement right now if I enable them, still -40dBm light both sides. We can't leave them enabled as it'll generate alert mails about config diffs.

Anyway if we are confident the patches look physically correct I guess we need to test some other way. Do you have an OTDR or light meter to test the fiber end-to-end? Only thing I can really think of, could obviously be an issue with the structured cabling between the cages.

@cmooney, I have a light meter and I see light from lsw-f and lsw-e to the demarc, and then I see light to cr1 and cr2 from old cage demarc. maybe is the optics?

@Cmjohnson thanks ok.

yeah it is odd. All the switch->switch links have come up ok (using the same CWDM4 optics), so it'd be unusual that (at least) 2 of the 4 used on these links are bad. But not impossible.

Can you swap the optics both sides of one of the links if we have spares? I'll check here if there is any improvement.

thanks.

@Cmjohnson reversed the fibers and we got the links up:

cmooney@re0.cr1-eqiad> show interfaces diagnostics optics et-1/0/2 | match "receiver power" | except "high|low"    
    Laser receiver power                      :  0.972 mW / -0.12 dBm
    Laser receiver power                      :  0.820 mW / -0.86 dBm
    Laser receiver power                      :  1.028 mW / 0.12 dBm
    Laser receiver power                      :  0.968 mW / -0.14 dBm
cmooney@re0.cr2-eqiad> show interfaces diagnostics optics et-1/0/2 | match "receiver power" | except "high|low" 
    Laser receiver power                      :  1.590 mW / 2.01 dBm
    Laser receiver power                      :  1.290 mW / 1.11 dBm
    Laser receiver power                      :  0.756 mW / -1.22 dBm
    Laser receiver power                      :  0.518 mW / -2.85 dBm

Thanks for that!

I think this task is pretty much complete now, unless we want to keep it open for when the real Spines/QFX5120-32C arrive?

Actually one thing that is outstanding I believe is to confirm the cable IDs?

Inter-Switch Links

I documented the inter-switch links in Netbox:

https://netbox.wikimedia.org/dcim/devices/3929/interfaces/

https://netbox.wikimedia.org/dcim/devices/3932/interfaces/

Cable IDs have just been generated randomly as I didn't have info on any IDs. Are there labels/cable IDs on the links in the new cage? If so can the labels on the links on the above pages be updated to match what's on the cables?

Inter-Cage Links

Also we need to document the LSW -> CR links in Netbox. These are somewhat new for us, as they are internal to Eqiad yet traverse several patch panels. How to document that in Netbox was the discussion of T293221. I'm not sure if there was a final decision but I think it was leaning towards using a "dummy circuit" (and documenting the patching in the free text of that), rather than creating the panels themselves in Netbox.

@wiki_willy I think we need to confirm the final decision there, as it's a physical thing I think that call probably should be with DC-Ops. Once we're decided how to do it I can add the detail to Netbox no problem, but in either case I'll need to get the detail of the LSW to CR runs that are in place. I know the device ports, but need to know the patch panel names, locations, port details, and the cable labels.

Any questions just feed back here on irc. Thanks!

Hi @cmooney - here's the doc that @Jclark-ctr put together when running the cables for the inter-switch links. Some of the cables had mismatched serial numbers from the vendor (that John highlighted in yellow), but everything else should cover the cable ids:

https://docs.google.com/spreadsheets/d/1RIhDTbiYRjmAQkM-cr2FGcDx7tVFbCW6hYVObglYoIU/edit#gid=0

For the inter-cage cables, @Cmjohnson - can you add those onto the spreadsheet, and either Chris or John - get all the inter-switch and inter-cage cable ids entered into Netbox?

Regarding a final decision on how we document these patch panels, I'll give it a couple more days for any feedback from the team in T293221, before moving forward. I have an idea, but just want to give a final opportunity for comments.

Thanks,
Willy

Thanks @wiki_willy for sharing the link. It is missing the IDs for the fiber links between switches though, it just has the console links from the Leaf switches and management ones.

In terms of the panels I'd still be in favour of using Netbox to model them fully, especially with the "in-rack" panels now making it a multi-hop link. But I'm happy enough with whatever the decision is, I understand there are various views.

cmooney updated the task description. (Show Details)
Jclark-ctr updated the task description. (Show Details)

@cmooney i have connected spine switches to scs and updated netbox

@Jclark-ctr super thanks for that! I'll open a task and start planning how we take care of the move.

@Jclark-ctr I'm not getting any output on port 20 or 29 of the scs-f8. Are the two Junipers powered on?

If not can you double check the cabling? They should be connected to the 'CON' port (number 5 here: https://www.juniper.net/documentation/us/en/hardware/qfx5120/qfx5120.pdf#unique_3_Connect_42_d50e632)

@Jclark-ctr - just following up Cathal's last comment

@Jclark-ctr I'm not getting any output on port 20 or 29 of the scs-f8. Are the two Junipers powered on?

If not can you double check the cabling? They should be connected to the 'CON' port (number 5 here: https://www.juniper.net/documentation/us/en/hardware/qfx5120/qfx5120.pdf#unique_3_Connect_42_d50e632)

@Jclark-ctr bit of a heads up I'm hoping to get the migration kicked off for those Juniper Spine devices now that we've got the licencing sorted. Plan detailed in T322937.

Will need to discuss with Willy and others, as there are optics we need to buy etc. but that's the overall plan. If we do proceed that way step 1 will be to get these links cabled up:

Two links between racks E1 and F1:

LSW1-E1 to SSW1-F1: https://netbox.wikimedia.org/dcim/interfaces/27396/trace/
LSW1-F1 to SSW1-E1: https://netbox.wikimedia.org/dcim/interfaces/27371/trace/

And two more that stay within those racks:

LSW1-E1 to SSW1-E1: https://netbox.wikimedia.org/dcim/interfaces/27363/trace/
LSW1-F1 to SSW1-F1: https://netbox.wikimedia.org/dcim/interfaces/27404/trace/

Those can be added any time once we get the gear. The steps after that will need to be co-ordinated between us, but the actual cable moves are all within E1 or F1 each time. Thanks.

cmooney mentioned this in Unknown Object (Task).Nov 15 2022, 11:11 AM

@Jclark-ctr can I get an update on the situation here / estimate of when we might be able to add the 4 links detailed above? Ping me on irc if any questions thanks.

@cmooney sorry for delay finished connecting links and updated cableid's

@cmooney Racks e5-7 f5-7 have been cabled and racked do you want to use same ticket for those Switches?

Let's use a new task for the new racks and keep this one for the spines. Speaking of spines we might want to hold on cabling the new ToRs before the spines are ready so we don't have to move the cables.

@cmooney Racks e5-7 f5-7 have been cabled and racked do you want to use same ticket for those Switches?

I've created task T334231 to track the cabling for the new racks.

@Jclark-ctr hey. It's taken a bit of time to line this up, hit a few bumps in the road with the Juniper config.

As detailed in T322937#8862660 we need to change the plan slightly and move the LVS connections before we tackle moving the row E/F LEAF switch links from lsw to ssw.

First one we want to tackle is the lvs1020 link, as this is our backup lvs. It's in rack F1 so we can just move the b-end of the cable from one switch to the other in the same rack:

Current deviceCurrent portNew deviceNew port
lsw1-f1-eqiadxe-0/0/47ssw1-f1-eqiadxe-0/0/33

Let me know when you think we might be able to do this. We'll need to work together on it as I need to adjust the switch config when we make the move.

Thanks.

@cmooney i am available tomorrow if you would like to address it that quickly. otherwise monday

@Jclark-ctr thanks yeah I just had a word with @ssingh and I think tomorrow if probably possible.

What time suits you to be on site?

@Jclark-ctr all went well with that today thank you for your help.

For the next phase we need to move the following links:

NoRackLVS ServerOld SwitchOld IntNew SwitchNew Int
1E1lvs1018lsw1-e1-eqiadxe-0/0/47ssw1-e1-eqiadxe-0/0/33
2F1lvs1019lsw1-f1-eqiadxe-0/0/46ssw1-f1-eqiadxe-0/0/32
3E1lvs1017lsw1-e1-eqiadxe-0/0/46ssw1-e1-eqiadxe-0/0/32

We can tackle in the order they are listed. The changes here are a bit more involved, as these are handling live traffic, so we will need to depool them, move the cable, test connections, then re-pool. Should be fairly quick and straightforward though.

There is no great rush, the earliest we can probably start is next Tues (May 23rd). 09.30 local / 13.30 UTC worked well today, if that time suits for the rest that's good with me (as long as @ssingh agrees).

thanks!

@Jclark-ctr all went well with that today thank you for your help.

For the next phase we need to move the following links:

NoRackLVS ServerOld SwitchOld IntNew SwitchNew Int
1E1lvs1018lsw1-e1-eqiadxe-0/0/47ssw1-e1-eqiadxe-0/0/33
2F1lvs1019lsw1-f1-eqiadxe-0/0/46ssw1-f1-eqiadxe-0/0/32
3E1lvs1017lsw1-e1-eqiadxe-0/0/46ssw1-e1-eqiadxe-0/0/32

We can tackle in the order they are listed. The changes here are a bit more involved, as these are handling live traffic, so we will need to depool them, move the cable, test connections, then re-pool. Should be fairly quick and straightforward though.

There is no great rush, the earliest we can probably start is next Tues (May 23rd). 09.30 local / 13.30 UTC worked well today, if that time suits for the rest that's good with me (as long as @ssingh agrees).

thanks!

Thanks for checking Cathal! That times works for me and I will depool the host before that.

Completed today
1 E1 lvs1018 lsw1-e1-eqiad xe-0/0/47 ssw1-e1-eqiad xe-0/0/33

Thanks @Jclark-ctr I think we're good to do the other two lvs moves whenever you are ready. Please ping me on irc and we can arrange.

I've updated the table to list out all the moves we need to do to close out T322937 and this task. All moves are similar just moving cable from one switch to the other in the same rack. There are some changes before/after each one but all stuff I can take care of fairly quickly.

NoRackDescriptionOld SwitchOld IntNew SwitchNew Int
1E1lvs1018lsw1-e1-eqiadxe-0/0/47ssw1-e1-eqiadxe-0/0/33
2F1lvs1019lsw1-f1-eqiadxe-0/0/46ssw1-f1-eqiadxe-0/0/32
3E1lvs1017lsw1-e1-eqiadxe-0/0/46ssw1-e1-eqiadxe-0/0/32
4E1CR1 Uplinklsw1-e1-eqiadet-0/0/48ssw1-e1-eqiadet-0/0/31
5F1CR2 Uplinklsw1-f1-eqiadet-0/0/48ssw1-f1-eqiadet-0/0/31
6E1E1 to E2 Downlinklsw1-e1-eqiadet-0/0/49ssw1-e1-eqiadet-0/0/1
7F1F1 to E2 Downlinklsw1-f1-eqiadet-0/0/49ssw1-f1-eqiadet-0/0/1
8E1E1 to E3 Downlinklsw1-e1-eqiadet-0/0/50ssw1-e1-eqiadet-0/0/2
9F1F1 to E3 Downlinklsw1-f1-eqiadet-0/0/50ssw1-f1-eqiadet-0/0/2
10E1E1 to F2 Downlinklsw1-e1-eqiadet-0/0/53ssw1-e1-equadet-0/0/9
11F1F1 to F2 Downlinklsw1-f1-eqiadet-0/0/53ssw1-f1-equadet-0/0/9
12E1E1 to F3 Downlinklsw1-e1-eqiadet-0/0/54ssw1-e1-eqiadet-0/0/10
13F1F1 to F3 Downlinklsw1-f1-eqiadet-0/0/54ssw1-f1-eqiadet-0/0/10
14E1LSW1-E1 port movelsw1-e1-eqiadet-0/0/51lsw1-e1-eqiadet-0/0/54
15F1LSW-F1 port movelsw1-f1-eqiadet-0/0/51lsw1-f1-eqiadet-0/0/54

We migrated a bunch of network <-> network links today without issue (crossed them out in above table). Didn't touch the LVS's after causing a major incident by disabling pybal on lvs1019 while a deploy was going on. Shouldn't be a problem doing them, just need to avoid doing it when there is a deploy on (lesson learnt on my part shame it had to happen that way).

Will discuss with @Jclark-ctr about timing and complete the rest of the moves.

@Jclark-ctr let me know when it might suit to try and get more of these moves done. Thanks.

All links have now been migrated. Massive thanks to @Jclark-ctr for all the work on site!