Page MenuHomePhabricator

POPs - free up 2xQSFP ports
Open, LowPublic

Description

To make room for the soon to arrive transport links, we need to replace the cr<->cr links with 2 vlans through the ToR switches.

  • esams (40G + 100G)
  • ulsfo
  • eqsin
  • drmrs
  • magru
  • Assign new IPs
  • Create/trunk vlans
  • Depool site - optional
  • Update OSPF/BGP config
  • Verify

Related Objects

StatusSubtypeAssignedTask
Opencmooney

Event Timeline

ayounsi triaged this task as High priority.
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
ayounsi added a parent task: Unknown Object (Task).Apr 28 2026, 6:11 AM

My basic thoughts on this are:

  • We create a new vlan on each top-of-rack switch at the POPs for the "core router transport"
    • suggest corebgp-<rack>-<site> for it
  • We allocate a /64 public IP range for it
  • For IPv4 a /30 is probably enough?
    • Part of me feels like we could maybe assign a /29 in case we have more CRs in future, or to ease migration from old -> new CR?
  • On the CRs we've a new sub-interface on each switch-facing port connecting to these new vlans, with one IP of each address fam per CR
  • We run OSPF on this new sub-interface, to learn the loopback IP of the other CR(s)
  • The exsiting IBGP peering does not need to change, but we can then cost-out the old direct link, shifting traffic

Something like that anyway.

cmooney renamed this task from POPs - free up 2x100G ports to POPs - free up 2xQSFP ports.Apr 28 2026, 3:26 PM
cmooney edited parent tasks, added: Restricted Task; removed: Unknown Object (Task).Apr 29 2026, 12:26 PM

suggest corebgp-<rack>-<site> for it

I suggest core1 instead of corebgp but that lgtm!

For v4 I'd have thought a /31 for a vlan used only between 2 CRs.
So if we add another CR3: then we create core2-xx-yyyyy to connect cr1 to cr3, and core3 to connect cr2 to cr3 ? A bit like the Xlink we have around.
Maybe even just go with unnumbered BGP/OSPF ?

I suggest core1 instead of corebgp but that lgtm!

Yep that works :)

For v4 I'd have thought a /31 for a vlan used only between 2 CRs.
So if we add another CR3: then we create core2-xx-yyyyy to connect cr1 to cr3, and core3 to connect cr2 to cr3 ? A bit like the Xlink we have around.
Maybe even just go with unnumbered BGP/OSPF ?

No strong preference. In my mind the neater way to do it is use one vlan per switch, with all CRs connected to it using that vlan if they have to exchange traffic. As things scale that's more efficient. With the "p2p" approach if you've 6 CRs you end up with 15 vlans per switch, and 15 x /31s = 30 IPs used. Versus one vlan and a /29 using only 8 IPs.

But no point getting bogged down on a hypothetical, we can do a /31 for now and think about if the scenario arises.

Maybe even just go with unnumbered BGP/OSPF ?

I'm probably old and conservative but I'm not a huge fan of this. Perhaps down the road I'm not sure to rush into it now.

Static IPv6 addressing and doing OSPFv3, with IPv6 next-hop for the BGP routes I'd probably be more amenable to. But I still think for this project it might be adding more than we want. I definitely think it would complicate the migration, given the existing links are dual stack and we have BGP between the IPv4 loopbacks.

So my instincts are to keep this as things are, and perhaps review some of that when we look at the wider WAN setup / core router config?

So anyway, for now I'd propose we add the following vlans for this:

341  core1-bw27-esams
342  core1-by27-esams

441  core1-22-ulsfo
442  core1-23-ulsfo

541  core1-603-eqsin
542  core1-604-eqsin

641  core1-b12-drmrs
642  core1-b13-drmrs

741  core1-b3-magru
742  core1-b4-magru

@ayounsi let me know if those look good in terms of numbering convention/names. If you're happy to go with the dual-stack public addressing I can assign the subnets/IPs and get going on the config.

OK, that sounds good to me! Thanks

Agh hit a bit of a hiccup with this (really should have anticipated). Take drmrs for example:

cmooney@asw1-b12-drmrs> show configuration interfaces et-0/0/48 
description "Core: cr1-drmrs:et-0/0/1 {#D0100}";
mtu 9192;
unit 0 {
    family inet {
        address 185.15.58.143/31;
    }
    family inet6 {
        address 2a02:ec80:600:fe06::2/64;
    }
}

That's obviously a layer-3 interface peering with the CR. So we can't just add the new vlan to it as a trunk port (we can on the Nokias which is nice).

I'm not a fan but I think the only way we can do this is to change that to a trunk, and move the existing IPs to new Xlink vlans (one for each CR<->asw link)?

Vlan 651, 652, 653 might be the way to number them?

Mentioned in SAL (#wikimedia-operations) [2026-05-12T19:06:21Z] <topranks> migrate link from cr1-magru to asw1-b3-magru to L2 trunk on the switch side T424611

Mentioned in SAL (#wikimedia-operations) [2026-05-12T19:25:58Z] <topranks> migrate link from cr2-magru to asw1-b3-magru to L2 trunk on the switch side T424611

Mentioned in SAL (#wikimedia-operations) [2026-05-12T19:43:20Z] <topranks> migrate link from cr1-magru to asw1-b4-magru to L2 trunk on the switch side T424611

Mentioned in SAL (#wikimedia-operations) [2026-05-12T19:52:32Z] <topranks> migrate link from cr2-magru to asw1-b4-magru to L2 trunk on the switch side T424611

Mentioned in SAL (#wikimedia-operations) [2026-05-12T20:05:19Z] <topranks> migrate link from cr1-esams to asw1-bw27-esams to L2 trunk on the switch side T424611

Mentioned in SAL (#wikimedia-operations) [2026-05-12T20:16:20Z] <topranks> migrate link from cr2-esams to asw1-bw27-esams to L2 trunk on the switch side T424611

Mentioned in SAL (#wikimedia-operations) [2026-05-12T20:23:25Z] <topranks> migrate link from cr1-esams to asw1-by27-esams to L2 trunk on the switch side T424611

Mentioned in SAL (#wikimedia-operations) [2026-05-12T20:35:26Z] <topranks> migrate link from cr2-esams to asw1-by27-esams to L2 trunk on the switch side T424611

Mentioned in SAL (#wikimedia-operations) [2026-05-12T20:44:48Z] <topranks> migrate link from cr1-drmrs to asw1-b12-drmrs to L2 trunk on the switch side T424611

Change #1286501 had a related patch set uploaded (by Cathal Mooney; author: Cathal Mooney):

[operations/dns@master] Add INCLUDEs for new IPs allocated for IBGP peering at POPs

https://gerrit.wikimedia.org/r/1286501

Mentioned in SAL (#wikimedia-operations) [2026-05-12T20:54:28Z] <topranks> migrate link from cr2-drmrs to asw1-b12-drmrs to L2 trunk on the switch side T424611

Mentioned in SAL (#wikimedia-operations) [2026-05-12T21:03:48Z] <topranks> migrate link from cr1-drmrs to asw1-b13-drmrs to L2 trunk on the switch side T424611

Mentioned in SAL (#wikimedia-operations) [2026-05-12T21:15:34Z] <topranks> migrate link from cr1-drmrs to asw1-b13-drmrs to L2 trunk on the switch side T424611

Change #1286501 merged by Cathal Mooney:

[operations/dns@master] Add INCLUDEs for new IPs allocated for IBGP peering at POPs

https://gerrit.wikimedia.org/r/1286501

Mentioned in SAL (#wikimedia-operations) [2026-05-13T08:08:43Z] <topranks> reconfigure link from cr4-ulsfo to asw1-22-ulsfo as 802.1q tagged T424611

Mentioned in SAL (#wikimedia-operations) [2026-05-13T10:33:16Z] <topranks> switch eqsin core router ibgp path to route via switches T424611

Mentioned in SAL (#wikimedia-operations) [2026-05-13T11:27:32Z] <topranks> add ibgp peering between cr1-drms and cr2-drmrs over loopback IPs T424611

Mentioned in SAL (#wikimedia-operations) [2026-05-13T11:40:28Z] <topranks> delete old direct ibgp peering between cr1-drms and cr2-drmrs T424611

Mentioned in SAL (#wikimedia-operations) [2026-05-13T12:02:36Z] <topranks> add ibgp peering between cr1-esams and cr2-esams over loopback IPs T424611

Mentioned in SAL (#wikimedia-operations) [2026-05-13T12:38:40Z] <topranks> add ibgp peering between cr1-magru and cr2-magru over loopback IPs T424611

Change #1286913 had a related patch set uploaded (by Cathal Mooney; author: Cathal Mooney):

[operations/homer/public@master] Add new IBGP sub-interfaces to OSPF on core routers at POPs

https://gerrit.wikimedia.org/r/1286913

Change #1286913 merged by jenkins-bot:

[operations/homer/public@master] Add new IBGP sub-interfaces to OSPF on core routers at POPs

https://gerrit.wikimedia.org/r/1286913

Mentioned in SAL (#wikimedia-operations) [2026-05-13T16:19:59Z] <topranks> update OSPF config on eqsin core routers to shift traffic to switch links T424611

Mentioned in SAL (#wikimedia-operations) [2026-05-13T16:29:00Z] <topranks> update OSPF config on drmrs core routers to shift traffic to switch links T424611

Mentioned in SAL (#wikimedia-operations) [2026-05-13T17:23:58Z] <topranks> update OSPF config on esams core routers to shift traffic to switch links T424611

Mentioned in SAL (#wikimedia-operations) [2026-05-13T17:36:47Z] <topranks> update OSPF config on magru core routers to shift traffic to switch links T424611

Change #1286993 had a related patch set uploaded (by Cathal Mooney; author: Cathal Mooney):

[operations/dns@master] Reverse PTRs: add include statements for ulsfo and eqsin new ranges

https://gerrit.wikimedia.org/r/1286993

Change #1286993 merged by Cathal Mooney:

[operations/dns@master] Reverse PTRs: add include statements for ulsfo and eqsin new ranges

https://gerrit.wikimedia.org/r/1286993

Change #1287439 had a related patch set uploaded (by Cathal Mooney; author: Cathal Mooney):

[operations/dns@master] Remove INCLUDE statements for CR<->CR link networks no longer used

https://gerrit.wikimedia.org/r/1287439

Change #1287440 had a related patch set uploaded (by Cathal Mooney; author: Cathal Mooney):

[operations/homer/public@master] common.yaml: remove OSPF definitions for esams/drmrs/magru cr links

https://gerrit.wikimedia.org/r/1287440

Change #1287440 merged by jenkins-bot:

[operations/homer/public@master] common.yaml: remove OSPF definitions for esams/drmrs/magru cr links

https://gerrit.wikimedia.org/r/1287440

Mentioned in SAL (#wikimedia-operations) [2026-05-14T16:21:14Z] <topranks> disable core router direct link at magru now that traffic is flowing via switches T424611

Mentioned in SAL (#wikimedia-operations) [2026-05-14T16:25:12Z] <topranks> disable core router direct link at drmrs now that traffic is flowing via switches T424611

Mentioned in SAL (#wikimedia-operations) [2026-05-14T16:31:47Z] <topranks> disable core router direct link at esams now that traffic is flowing via switches T424611

@RobH we've disabled the following links as part of the work on this, and will need to remove the current fibres and optic modules before we re-use the ports for the new transport circuits being delivered.

Can you advise what you'd like to do best? We can possibly wait until we are submitting the remote hands requests to connect the new transport circuits and bundle removing these old fibres with that. Or we can tidy up the old links now and have those ports fully empty for when the new circuits start arriving.

magru

https://netbox.wikimedia.org/dcim/interfaces/33914/trace/

drmrs

https://netbox.wikimedia.org/dcim/interfaces/20594/trace/

ulsfo

https://netbox.wikimedia.org/dcim/interfaces/22/trace/
https://netbox.wikimedia.org/dcim/interfaces/53/trace/

That second one is a 10G we also had as backup, port won't be re-used for anything.

esams

https://netbox.wikimedia.org/dcim/interfaces/31989/trace/
https://netbox.wikimedia.org/dcim/interfaces/31990/trace/
https://netbox.wikimedia.org/dcim/interfaces/31991/trace/
https://netbox.wikimedia.org/dcim/interfaces/31992/trace/

Esams is a little odd here. Effectively it is one cable, coming from a single QSFP port with MPO connector on the cr2-esams side, and breaking out to four separate LC connectors on the other end plugged into four separate 10G SFPs on cr1-esams.

Change #1287439 merged by Cathal Mooney:

[operations/dns@master] Remove INCLUDE statements for CR<->CR link networks no longer used

https://gerrit.wikimedia.org/r/1287439

Mentioned in SAL (#wikimedia-operations) [2026-05-15T09:32:16Z] <topranks> Migrate cr4-ulsfo link to asw1-23-ulsfo to tagged interface T424611

Mentioned in SAL (#wikimedia-operations) [2026-05-15T09:56:05Z] <topranks> Migrate cr3-ulsfo link to asw1-22-ulsfo to tagged interface T424611

Mentioned in SAL (#wikimedia-operations) [2026-05-15T10:10:18Z] <topranks> Migrate ulsfo cr<->cr traffic to use path via switches not direct link T424611

Change #1287831 had a related patch set uploaded (by Cathal Mooney; author: Cathal Mooney):

[operations/homer/public@master] ulsfo: enable ospf on new links via switches and set metric on direct

https://gerrit.wikimedia.org/r/1287831

Change #1287831 merged by jenkins-bot:

[operations/homer/public@master] ulsfo: enable ospf on new links via switches and set metric on direct

https://gerrit.wikimedia.org/r/1287831

My suggestions here would be to open the decom tasks and leave it to DCops on how they want to tackle them.

For esams that finally cleared the high power alarm, and we're going to be able to add Prometheus alerting: https://grafana.wikimedia.org/goto/dfmeymkkj05q8a - https://gerrit.wikimedia.org/r/c/operations/alerts/+/1288462

Everything is more-or-less done here. The eqsin link is still operational, though traffic is flowing via the switches due to OSPF cost. We can leave that one in place for now as it may be useful to us during the upcoming eqsin switch refresh, though I feel if the transport circuits land first we should connect them instead.

cmooney added a subtask: Restricted Task.
cmooney added a subtask: Restricted Task.
cmooney added a subtask: Restricted Task.
cmooney lowered the priority of this task from High to Low.Fri, May 22, 1:20 PM
RobH closed subtask Restricted Task as Resolved.Mon, Jun 1, 8:14 PM
RobH closed subtask Restricted Task as Resolved.Thu, Jun 4, 2:08 PM

Change #1297763 had a related patch set uploaded (by Cathal Mooney; author: Cathal Mooney):

[operations/homer/public@master] eqsin: remove OSPF on ae0 direct link between CRs

https://gerrit.wikimedia.org/r/1297763

Change #1297764 had a related patch set uploaded (by Cathal Mooney; author: Cathal Mooney):

[operations/dns@master] eqsin - remove reverse ptr include for 2001:df2:e500:fe05::/64

https://gerrit.wikimedia.org/r/1297764

Change #1297764 merged by Cathal Mooney:

[operations/dns@master] eqsin - remove reverse ptr include for 2001:df2:e500:fe05::/64

https://gerrit.wikimedia.org/r/1297764

Change #1297763 merged by jenkins-bot:

[operations/homer/public@master] eqsin: remove OSPF on ae0 direct link between CRs

https://gerrit.wikimedia.org/r/1297763