Page MenuHomePhabricator

ULSFO:Switch refresh diagram
Open, MediumPublic

Description

Below is the diagram showing how things are connected right now. Like I mentioned in the master task description, we are moving away from the Virtual Chassis setup to a flat BGP setup, also We will be moving the BGP connection from the mr router to the core routers to the productions switches to match the other POP's site like drmrs and magru.

ulsfo_before.jpg (462×492 px, 29 KB)

The new setup will look like the one below. For this setup We will need:
2+1 SFP-T Nokia compatible to connect the 2 switches to the mgmt router (mr)
8+1 QSFP-100G-SR4 Nokia compatible.
2x1m MTP multi-mode OM4
2x3m MTP multi-mode OM4

ULSFO_after.jpg (589×904 px, 53 KB)

  • Phase 1
  • Step 1

on-site
Connections that we need to change

  • mr1-ulsfo /scs-ulsfo

remove the connection from mr1-ulsfo port ge-0/0/2 to scs-ulsfo eth0
connect the scs-ulsfo eth0 to msw1-ulsfo any port (please provide port used)

  • mr1-ulsfo/msw2-ulsfo

move the connection on port ge-0/0/3 on mr1-ulsfo to port ge-0/0/2
msw2-ulsfo will now be connected to ge-0/0/2 and not ge-0/0/3

Netops
update the connection in netbox for ge-0/0/2
https://netbox.wikimedia.org/dcim/interfaces/11/
update the connection in netbox for ge-0/0/3
https://netbox.wikimedia.org/dcim/interfaces/12/

  • Step 2 please complete step1 before step 2

New connections

  • mgmt

asw1-22-ulsfo mgmt0 to any port on msw1-ulsfo (please provide port used)
asw1-23-ulfso mgmt0 to any port on msw2-ulsfo (please provide port used)

  • Console

asw1-22-ulsfo console to opengear scs port 8
asw1-23-ulsfo console to opengear scs port 9

  • asw1-22-ulsfo/mr1-ulsfo

connect port ethernet-1/1 on asw1-22-ulsfo to port ge-0/0/3 on mr1-ulsfo

  • Production

asw1-23-ulsfo port ethernet-1/55 to cr3-ulsfo port eht-0/0/2
asw1-23-ulsfo port ethernet-1/56 to cr4-ulsfo port eth-0/0/2

  • Phase 2 Will be done during the maintenance windows in Jan

In phase 2, we will have 3 connections to make that we didn't do in phase 1

  • asw1-22-ulsfo ethernet-1/55 to cr3-ulsfo et-0/0/1
  • asw1-23-ulsfo ethernet-1/56 to cr4-ulsfo et-0/0/1

because right now we are using et-0/0/1 on core routers to connect to the old switches

  • mr1-ulsfo ge-0/0/4 to asw2-22-uslfo ge-1/0/13

mr1-ulsfo ge-0/0/4 will connect to asw1-23-ulsfo ethernet-1/1

Event Timeline

Papaul triaged this task as Medium priority.

@Papaul looks good! Nothing jumping out at me as problematic in terms of the connectivity plan.

I don't think it makes sense to use 40G though, we should use 100G links between the switches and the CRs. 100G DAC/AOC/SR4/CWDM4/FR are all valid options. We're already using CWDM4 / duplex SMF between cr3-ulsfo and cr4-ulsfo if it helps to keep it standard. Or possibly use those for inter-rack and DAC/AOC for in-rack?

As the CRs are acting as Spines this gives us more bandwidth between servers in each of the racks. Not essential given our bandwidth usage, but 40G is effectively dead tech we shouldn't plan on using more of it.

@cmooney thanks for the feedback, I will upgrade the diagram to match the 100G links between the core routers and the switches and the type of transceivers needed.

Lots great thanks !

Not sure how best to show it on the diagram, but we also need to remove the 10G link between cr3 and cr4. Maybe you can show the 100G between cr3 and cr4 while mentioning it's already there.

Next step is to write a step by step guide on how to get from the current design to the wanted one. Possibly with intermediate diagrams.

For mr1, now that it's already configured with BGP, an option could be to move one of its upstream ports to one of the Nokias in advance. Until the Nokia is configured, it will only have 1 uplink but but will allow us to seamlessly transition it.

@ayounsi for the feed back i will work on it

@RobH I update the task description with all the connections that we need for phase 1 in December. Please don't forget the Cable ID's. Please let me know if you have any questions. Thanks

@ayounsi I need you input here.
et-0/0/1 on cr3/4-ulsfo are connected to asw2-22/23 the goal was to wait until phase 2 to move et-0/0/1 to the new Nokia switches, by doing this we will have no connection going to asw1-22-ulsfo during the initial setup and configuration. What I will like to do is to temporally using et-0/0/3 to do the initial setup/config and make sure all works from both cr's and asw1-22-ulsfo and during the final phase (migration) we can change move the link from et-0/0/3 to et-0/0/1. See more details in the task description. Thanks

Created ticket Case Order #01144222 for initial racking and wiring of the new Nokia switches.

The first step was completed by remote hands yesterday but the port number and the cable ID's were not given to me so I just got the information today. Both switches are now in Netbox I will start the initial config.

Initial configuration done on both switches. what left on the switches :

  • user homer password
  • sre.network.tls cookbook

I will work on this next week.

Change #1242429 had a related patch set uploaded (by Papaul; author: Papaul):

[operations/homer/public@master] Add new Nokia switches for initial homer run

https://gerrit.wikimedia.org/r/1242429

Change #1242429 merged by Papaul:

[operations/homer/public@master] Add new Nokia switches for initial homer run

https://gerrit.wikimedia.org/r/1242429

User homer password set on both switches and sre.network.tls.cookbook failed on asw1-23-ulsfo. first homer run on asw1-22-ulsfo is giving the error below. Error is about "evpn" since we are not using EVPN on those.

Traceback (most recent call last):
  File "/srv/deployment/homer/venv-1770131771/lib/python3.11/site-packages/homer/templates.py", line 115, in render
    instance = module.python_renderer(data)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/srv/homer/public/modules/nokia_asw.py", line 12, in __init__
    self._data = self._get_asw_data(data)
                 ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/srv/homer/public/modules/nokia_asw.py", line 43, in _get_asw_data
    data["evpn"] = data["netbox"]["device_plugin"]["ibgp_config"]["evpn"]
                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^
**KeyError: 'evpn'**

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/srv/deployment/homer/venv-1770131771/lib/python3.11/site-packages/homer/__init__.py", line 356, in _execute
    device_config = self._renderers[device.metadata.get('renderer', 'jinja')].render(
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/srv/deployment/homer/venv-1770131771/lib/python3.11/site-packages/homer/templates.py", line 126, in render
    raise HomerError(f'Error while trying to render JSON-RPC configuration: {e}') from e
homer.exceptions.HomerError: Error while trying to render JSON-RPC configuration: 'evpn'

Both switches are now running version 25.10.2. Still can not get the Cookbook sre.network.tls to pass on asw1-23-ulsfo.

@Papaul, can you try a factory reset of the switch from rack 23? (the one failing the TLS cookbook). I'm also still waiting for news from Nokia support.

@ayounsi yes I can do that. Do we have like some Documentation on how to factory reset the the Nokia switch somewhere or it is just "delete /" and reboot it?

@ayounsi factory reset the switch same issue.

I think the factory reset helped. I then temporarily copied the TLS config from asw1-22, and ran the TLS cookbook and we're all good.

So now homer runs fine. Now the next step is to convert the relevant routing policies from Junos to Nokia.