Page MenuHomePhabricator

Test Nokia switches BGP config for k8s workers
Closed, ResolvedPublic

Description

Before using hosts behind the new Nokia switches, Traffic and ServiceOps need to coordinate and test the new BGP policy. We should pick one test host and check everything works as expected before doing the whole batch.

We currently have the following wikikube workers in codfw racks with Nokia ToR switches:

  • E2
    • wikikube-worker2334
    • wikikube-worker2335
    • wikikube-worker2336
  • E4
    • wikikube-worker2339
    • wikikube-worker2340
    • wikikube-worker2341
  • E5
    • wikikube-worker2342
    • wikikube-worker2343
    • wikikube-worker2344
  • F2
    • wikikube-worker2348
    • wikikube-worker2349
    • wikikube-worker2350
  • F4
    • wikikube-worker2354
    • wikikube-worker2355
    • wikikube-worker2356

All implemented as part of T417772: wikikube-worker23[32-56] implementation tracking

Event Timeline

Raine triaged this task as High priority.Feb 18 2026, 7:35 PM
Raine edited projects, added Infrastructure-Foundations; removed Traffic.

Change #1242351 had a related patch set uploaded (by Kamila Součková; author: Kamila Součková):

[operations/deployment-charts@master] admin/common-bgp: add F4 ToR switch

https://gerrit.wikimedia.org/r/1242351

Change #1242351 merged by jenkins-bot:

[operations/deployment-charts@master] admin/common-bgp: add F4 ToR switch

https://gerrit.wikimedia.org/r/1242351

Change #1242366 had a related patch set uploaded (by Ayounsi; author: Ayounsi):

[operations/deployment-charts@master] Add BGP neighbors IPs for codfw E/F racks

https://gerrit.wikimedia.org/r/1242366

Change #1242380 had a related patch set uploaded (by Ayounsi; author: Ayounsi):

[operations/deployment-charts@master] Add BGP neighbors IPs for eqiad C/D racks

https://gerrit.wikimedia.org/r/1242380

Change #1242366 merged by jenkins-bot:

[operations/deployment-charts@master] Add BGP neighbors IPs for codfw E/F racks

https://gerrit.wikimedia.org/r/1242366

Change #1242380 merged by jenkins-bot:

[operations/deployment-charts@master] Add BGP neighbors IPs for eqiad C/D racks

https://gerrit.wikimedia.org/r/1242380

Raine changed the task status from Open to In Progress.Feb 23 2026, 2:06 PM
Raine updated Other Assignee, added: ayounsi.
Raine added a subscriber: ayounsi.

Missing ToR switches in Calico config are fixed (thanks @ayounsi for the patches!), a config change on the switches side is still needed.

Raine moved this task from Scheduled (this Q) to In Progress on the ServiceOps new board.
Raine moved this task from In Progress to Radar (Awareness) on the ServiceOps new board.
Raine updated Other Assignee, added: Raine; removed: ayounsi.

Change #1242410 had a related patch set uploaded (by Ayounsi; author: Ayounsi):

[operations/homer/public@master] Nokia: add local-as to k8s BGP sessions

https://gerrit.wikimedia.org/r/1242410

Change #1242410 merged by jenkins-bot:

[operations/homer/public@master] Nokia: add local-as to k8s BGP sessions

https://gerrit.wikimedia.org/r/1242410

You should be unblocked on the Netops side. Let me know if anything is still not working.

Cookbook cookbooks.sre.k8s.pool-depool-node started by kamila@cumin1003 pool for host wikikube-worker2356.codfw.wmnet completed:

  • wikikube-worker2356.codfw.wmnet (PASS)
    • Host wikikube-worker2356.codfw.wmnet pooled in wikikube-codfw

Looks like pods scheduled on the worker are happy, so LGTM. Thanks @ayounsi!

Looks like pods scheduled on the worker are happy, so LGTM. Thanks @ayounsi!

Spoke too early... Investigating.

Cookbook cookbooks.sre.k8s.pool-depool-node started by kamila@cumin1003 depool for host wikikube-worker2356.codfw.wmnet completed:

  • wikikube-worker2356.codfw.wmnet (PASS)
    • Host wikikube-worker2356.codfw.wmnet depooled from wikikube-codfw

We saw yesterday that BGP sessions seem to be okay from wikikube-worker2356 side:

root@wikikube-worker2356:~# calicoctl node status
Calico process is running.

IPv4 BGP status
+--------------+---------------+-------+------------+-------------+
| PEER ADDRESS |   PEER TYPE   | STATE |   SINCE    |    INFO     |
+--------------+---------------+-------+------------+-------------+
| 10.192.52.1  | node specific | up    | 2026-02-23 | Established |
+--------------+---------------+-------+------------+-------------+

IPv6 BGP status
+-------------------+---------------+-------+------------+-------------+
|   PEER ADDRESS    |   PEER TYPE   | STATE |   SINCE    |    INFO     |
+-------------------+---------------+-------+------------+-------------+
| 2620:0:860:127::1 | node specific | up    | 2026-02-24 | Established |
+-------------------+---------------+-------+------------+-------------+

But PODs running there are unreachable. The node has the following ipam blocks assigned currently (which it should announce to the ToR):

root@deploy1003:~# kubectl get blockaffinities.crd.projectcalico.org |grep wikikube-worker2356
wikikube-worker2356.codfw.wmnet-10-194-184-0-26                               7d17h
wikikube-worker2356.codfw.wmnet-2620-0-860-cabe-34bd-e7c3-888c-37c0-122       7d17h

@ayounsi would you be able to verify that these make it to the nokia switches (or point us at how we could verify that)?

Calico sees the BGP sessions as established, but the pods on the test host are unreachable:

root@deploy2002:~# kube-env admin codfw
root@deploy2002:~# kubectl get po -n opentelemetry-collector -o wide  # get pods with an open http port for testing, with nodes
curl 10.194.184.1:4318  # grep for wikikube-worker2356, take that pod's IP address => this hangs

(If you take any other pod's IP address, the pod responds as expected.)

@ayounsi can you help with this please?

A:elukey@lsw1-f4-codfw# show network-instance default protocols bgp neighbor 10.192.52.5 received-routes ipv4
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Peer        : 10.192.52.5, remote AS: 64602, local AS: 14907
Type        : static
Description : wikikube-worker2356
Group       : k8s4
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Status codes: u=used, *=valid, >=best, x=stale, b=backup, w=unused-weight-only
Origin codes: i=IGP, e=EGP, ?=incomplete
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|  Status                Network                         Path-id                   Next Hop                 MED                                    LocPref                                   AsPath               Origin      |
+=============================================================================================================================================================================================================================+
|             10.194.184.0/26                 0                               10.192.52.5                    -                                                                         [64602]                       i        |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1 received BGP routes : 0 used 0 valid
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

--{ running }--[  ]--

I am totally ignorant but the host may be missing some extra config to allow the /26 to be usable. No idea what to do though :(

A:elukey@lsw1-f4-codfw# info from state / network-instance default protocols bgp neighbor 10.192.52.5
[..]
    import-policy [
        k8s4_in
    ]

This is the policy associated:

A:elukey@lsw1-f4-codfw# info / routing-policy policy k8s4_in
    default-action {
        policy-result reject
    }
    statement k8s_prefixes {
        match {
            prefix {
                prefix-set kubernetes-ipv4
            }
        }
    }

The prefixes:

A:elukey@lsw1-f4-codfw# info / routing-policy prefix-set kubernetes-ipv4
    prefix 10.64.72.0/24 mask-length-range 26..26 {
    }
    prefix 10.67.128.0/17 mask-length-range 26..26 {
    }
    prefix 10.192.72.0/24 mask-length-range 26..26 {
    }
    prefix 10.194.128.0/17 mask-length-range 26..26 {
    }

And 10.194.128.0/17 contains 10.194.184.0/26 and the /26 range should be accepted. The magical powers of AI suggest me that the following is missing an explicit accept:

statement k8s_prefixes {
    match {
        prefix {
            prefix-set kubernetes-ipv4
        }
    }
}

Or not, I am not comfortable in adding anything to the Switches at the moment, so we'll have to wait Arzhel :)

Change #1245200 had a related patch set uploaded (by Ayounsi; author: Ayounsi):

[operations/homer/public@master] k8s: add missing accept statement

https://gerrit.wikimedia.org/r/1245200

Nice investigation @elukey, I'm still support to be off today, but here is homer a patch that adds the missing statement : https://gerrit.wikimedia.org/r/1245200
Unless someone wants to deploy it today, I'll do it on Monday.

diff from my local homer with that CR
      routing-policy {
          policy k8s4_in {
              statement k8s_prefixes {
                  action {
+                     policy-result accept
                  }
              }
          }
          policy k8s6_in {
              statement k8s_prefixes {
                  action {
+                     policy-result accept
                  }
              }
          }
      }

Change #1245200 merged by Elukey:

[operations/homer/public@master] k8s: add missing accept statement

https://gerrit.wikimedia.org/r/1245200

Thanks for the patch! Merged and committed to hopefully all the lsw codfw switches mentioned in the task's description. I checked lsw1-f4 and it looks better now:

A:elukey@lsw1-f4-codfw# show network-instance default protocols bgp neighbor 10.192.52.5 received-routes ipv4
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Peer        : 10.192.52.5, remote AS: 64602, local AS: 14907
Type        : static
Description : wikikube-worker2356
Group       : k8s4
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Status codes: u=used, *=valid, >=best, x=stale, b=backup, w=unused-weight-only
Origin codes: i=IGP, e=EGP, ?=incomplete
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|    Status                     Network                                 Path-id                          Next Hop                     MED                                               LocPref                                              AsPath                   Origin         |
+====================================================================================================================================================================================================================================================================================+
|     u*>        10.194.184.0/26                         0                                       10.192.52.5                           -                                                                                            [64602]                              i           |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1 received BGP routes : 1 used 1 valid
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Highlight: 1 received BGP routes : 1 used 1 valid

@Raine @JMeybohm when you have a moment can you confirm if pods are happier now?

@Raine @JMeybohm when you have a moment can you confirm if pods are happier now?

Unfortunately no. I tried to poke around a bit on the calico side and to me it seems the switch is not exporting any routes:

bird> show protocols all Node_10_192_52_1
name     proto    table    state  since       info
Node_10_192_52_1 BGP      master   up     10:27:54    Established   
  Description:    Connection to BGP peer
  Preference:     100
  Input filter:   (unnamed)
  Output filter:  (unnamed)
  Routes:         0 imported, 1 exported, 0 preferred
  Route change stats:     received   rejected   filtered    ignored   accepted
    Import updates:              0          0          0          0          0
    Import withdraws:            0          0        ---          0          0
    Export updates:             51          0         50        ---          1
    Export withdraws:            1        ---        ---        ---          0
  BGP state:          Established
    Neighbor address: 10.192.52.1
    Neighbor AS:      14907
    Neighbor ID:      10.192.255.38
    Neighbor caps:    refresh restart-aware AS4
    Session:          external multihop AS4
    Source address:   10.192.52.5
    Hold timer:       70/90
    Keepalive timer:  30/30

bird> show route protocol Node_10_192_52_1
bird>
bird> show route export Node_10_192_52_1
10.194.184.0/26    blackhole [static1 10:27:52] * (200)

This is pure speculation of me using AI tools to do network engineering, I am doing it because I am curious so no judgement please :D

The following seems weird:

A:elukey@lsw1-f4-codfw# show network-instance default route-table ipv4-unicast prefix 10.194.184.0/26
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
IPv4 unicast route table of network instance default
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+--------------------------------------------+-------+------------+----------------------+----------+----------+---------+------------+---------------------------+---------------------------+---------------------------+---------------------------+
|                   Prefix                   |  ID   | Route Type |     Route Owner      |  Active  |  Origin  | Metric  |    Pref    |      Next-hop (Type)      |    Next-hop Interface     |  Backup Next-hop (Type)   | Backup Next-hop Interface |
|                                            |       |            |                      |          | Network  |         |            |                           |                           |                           |                           |
|                                            |       |            |                      |          | Instance |         |            |                           |                           |                           |                           |
+============================================+=======+============+======================+==========+==========+=========+============+===========================+===========================+===========================+===========================+
| 10.194.184.0/26                            | 0     | bgp        | bgp_mgr              | True     | default  | 0       | 170        | 10.192.52.0/24            | irb0.2057                 |                           |                           |
|                                            |       |            |                      |          |          |         |            | (indirect/local)          |                           |                           |                           |
+--------------------------------------------+-------+------------+----------------------+----------+----------+---------+------------+---------------------------+---------------------------+---------------------------+---------------------------+
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

The next hop for 10.194.184.0/26 is 10.192.52.0/24 , meanwhile it should be 10.192.52.5 (see T417817#11657581).

The magic AI power suggest that this feature may be the culprit, since the switch may not "trust" 10.192.52.5 in these settings.

A:elukey@lsw1-f4-codfw# info from state network-instance default ip-forwarding receive-ipv4-check
    receive-ipv4-check true

I have no idea if we can disable it or not as test :D

CLI output might seem odd, but the route is working properly.

bast2003:~$ ping 10.194.184.46
PING 10.194.184.46 (10.194.184.46) 56(84) bytes of data.
64 bytes from 10.194.184.46: icmp_seq=1 ttl=59 time=0.246 ms
64 bytes from 10.194.184.46: icmp_seq=2 ttl=59 time=0.420 ms
bast2003:~$ nc -zv 10.194.184.46 4318
Connection to 10.194.184.46 4318 port [tcp/*] succeeded!

Unfortunately no. I tried to poke around a bit on the calico side and to me it seems the switch is not exporting any routes:
bird> show protocols all Node_10_192_52_1
[...]

That's expected, the switch is the router and the only route out of the host, so it just uses the host's default route, no need to learn more specifics.

CLI output might seem odd, but the route is working properly.

bast2003:~$ ping 10.194.184.46
PING 10.194.184.46 (10.194.184.46) 56(84) bytes of data.
64 bytes from 10.194.184.46: icmp_seq=1 ttl=59 time=0.246 ms
64 bytes from 10.194.184.46: icmp_seq=2 ttl=59 time=0.420 ms
bast2003:~$ nc -zv 10.194.184.46 4318
Connection to 10.194.184.46 4318 port [tcp/*] succeeded!

Surprise! I've only ever tested eqiad nodes from which ICMP and TCP still not work.

That's expected, the switch is the router and the only route out of the host, so it just uses the host's default route, no need to learn more specifics.

Thanks for confirming. I was assuming that additional routes would only be learned when there are more k8s workers in the same rack (e.g. the node could reach the nexthop directly).

So we have to configured local-as 14907 to workaround a Calico/k8s limitation (partially due to the fact that they are using an old Bird version under the hood).

with the local-as 14907 configuration option previously configured, the switch prepends by default the 14907 ASN to the prefix when exporting the route to its spine :

view from the spine
ssw1-e1-codfw> show route 10.194.184.0/26     
10.194.184.0/26    *[BGP/170] 3d 01:11:29, localpref 100, from 10.192.255.38
                      AS path: 14907 64602 I, validation-state: unverified
                    >  to 10.192.253.177 via et-0/0/8.0

So from the leaf PoV, everything was fine.

But then the spine didn't want to export the prefix to the core routers as the core routers use AS14907 as their own ASN, so that would have created an AS LOOP, which is not allowed by default in the BGP world.
That's why connectivity was working within codfw but not further.

Hopefully there is a configuration knob made jsut to avoid that:
lsw1-f4-codfw# set /network-instance default protocols bgp group k8s4 local-as prepend-local-as false

I manually configured it, and I can now reach the test IP from eqiad:

deploy1003:~$ nc -zv 10.194.184.46 4318
Connection to 10.194.184.46 4318 port [tcp/*] succeeded!

I'll send a Homer patch to normalize that shortly.

Change #1247039 had a related patch set uploaded (by Ayounsi; author: Ayounsi):

[operations/homer/public@master] Nokia: don't prepend local-as when local-as is used

https://gerrit.wikimedia.org/r/1247039

Change #1247039 merged by jenkins-bot:

[operations/homer/public@master] Nokia: don't prepend local-as when local-as is used

https://gerrit.wikimedia.org/r/1247039

Updated configuration has been deployed to all relevant switches in codfw. Thanks @ayounsi !

I'm going to repool wikikube-worker2356 in a minute and reschedule some workload there.

Cookbook cookbooks.sre.k8s.pool-depool-node started by jayme@cumin1003 pool for host wikikube-worker2356.codfw.wmnet completed:

  • wikikube-worker2356.codfw.wmnet (PASS)
    • Host wikikube-worker2356.codfw.wmnet pooled in wikikube-codfw

I'm going to repool wikikube-worker2356 in a minute and reschedule some workload there.

Done. I'm getting responses from mw-debug (via WikimediaDebug) scheduled there so I think we're good on this front.

Change #1265386 had a related patch set uploaded (by JMeybohm; author: Ayounsi):

[operations/homer/public@master] k8s4_in / k8s6_in - add missing policy-result: accept

https://gerrit.wikimedia.org/r/1265386

Change #1265386 merged by jenkins-bot:

[operations/homer/public@master] k8s4_in / k8s6_in - add missing policy-result: accept

https://gerrit.wikimedia.org/r/1265386