Page MenuHomePhabricator

Set consistent MTUs
Closed, ResolvedPublic

Description

There are outstanding MTU discrepancies on our network, reported by the Netbox report:
https://netbox.wikimedia.org/extras/reports/network.Network/

We should fix them out (or add exceptions if they make sens).

Secondly we have both 9192 configured and 9216 (introduced recently) which is whitelisted there. Unless it's for overlay reasons we should use a single one. If so, I'd recommend 9192 as it's the one already configured in most places.

Event Timeline

ayounsi created this task.
Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Thanks @ayounsi. Yeah 9216 was default max I had used for the VXLAN stuff originally, but 9192 is more than enough to support a 9,000 byte IP packet and allow for the VXLAN encap on top.

I'll have a look at netbox/templates and make sure it's cleaned up on the row E/F devices, then see what is left.

Change 838755 had a related patch set uploaded (by Ayounsi; author: Ayounsi):

[operations/software/netbox-extras@master] Network MTU check, remove 9216 from allowlist

https://gerrit.wikimedia.org/r/838755

Change 838755 merged by jenkins-bot:

[operations/software/netbox-extras@master] Network MTU check, remove 9216 from allowlist

https://gerrit.wikimedia.org/r/838755

Mentioned in SAL (#wikimedia-operations) [2022-10-05T11:53:23Z] <XioNoX> fix MTU between eqiad core routers and cloudsw - T315838

Just FYI I've adjusted one of the links on the row E/F switches now. Quick run-down of process:

  1. Drain link by chaning OSPF interface cost both sides:
    • set protocols ospf area 0.0.0.0 interface <interface>.<unit> metric 1000
  2. Adjust MTU in Netbox to 9192
  3. Run homer againt both devices
  4. Check BGP/OSPF adjacencies ok and then remove metric
    • delete protocols ospf area 0.0.0.0 interface <interface>.<unit> metric 1000

The OSPF adjacency bounced with the MTU change, once the first side was done it rest and negotiation was stuck at Exchange Start as you'd expect. Once other side of the link was done OSPF came back up.

BGP was not affected. This makes sense here as the peering is between loopbacks, rather than directly over the interface being changed.

Anyway safe enough, I'll do some others now.

Ok I've fixed the MTUs for all the underlay / switch to switch links in the new cage now.

All that remains on those are the uplink sub-ints to the CRs, which for some reason are at 9174. I'll have a look at them again, no doubt some reason I concocted for that odd number but I don't think there is any reason they can't be 9192.

Actually I've discovered something odd on those sub-interfaces between switches and cr's.

Firstly the value I was seeing was the protocol mtu (i.e. payload mtu) as I was looking at the sub-interface. 9192 value we set is the L2 MTU on the physical.

But between lsw1-f1-eqiad I see this:

cmooney@lsw1-f1-eqiad> show interfaces et-0/0/48 | match mtu 
  Link-level type: Flexible-Ethernet, MTU: 9192, LAN-PHY mode, Speed: 100Gbps, BPDU Error: None, Loop Detect PDU Error: None, Ethernet-Switching Error: None, MAC-REWRITE Error: None, Loopback: Disabled,
    Protocol inet, MTU: 9174
    Protocol inet6, MTU: 9174

{master:0}
cmooney@lsw1-f1-eqiad> show interfaces et-0/0/48.100 | match mtu 
    Protocol inet, MTU: 9174
    Protocol inet6, MTU: 9174
cmooney@re0.cr1-eqiad> show interfaces et-1/0/2 | match mtu         
  Link-level type: Flexible-Ethernet, MTU: 9192, MRU: 9200, Speed: 100Gbps, BPDU Error: None, Loop Detect PDU Error: None, Loopback: Disabled, Source filtering: Disabled, Flow control: Enabled
    Protocol inet, MTU: 9170
    Protocol inet6, MTU: 9170

{master:0}
cmooney@re0.cr1-eqiad> show interfaces et-1/0/2.100 | match mtu 
    Protocol inet, MTU: 9170
    Protocol inet6, MTU: 9170

If you look there is a 4-byte difference in what they they think the MTU should be, despite both having 9192 set on the physical.

FWIW I didn't get to the bottom of the MTU difference. But I was able to confirm that it is a real issue, i.e. there is a 4-byte "blackhole" where the switches will transmit packets without trying to fragment, but the CR will consider above MTU and drop.

Not a worry I think though. Any host that may be configured for jumbo frames should be set up for 9000byte (ip) MTU. So for actual end-host traffic we should never hit the problem.

Mentioned in SAL (#wikimedia-operations) [2023-01-04T13:15:16Z] <XioNoX> fix missmatch MTU on cloudsw switches - T315838

Mentioned in SAL (#wikimedia-operations) [2023-01-04T13:33:44Z] <XioNoX> fix missmatch MTU on pfw3-codfw - T315838

Mentioned in SAL (#wikimedia-operations) [2023-01-04T13:41:22Z] <XioNoX> drain esams-eqiad link for mtu change - T315838

Mentioned in SAL (#wikimedia-operations) [2023-01-04T13:44:59Z] <XioNoX> repool esams-eqiad link for mtu change - T315838

Mentioned in SAL (#wikimedia-operations) [2023-01-04T14:04:14Z] <XioNoX> fix inconsistent mtu on mr1-ulsfo - T315838

Mentioned in SAL (#wikimedia-operations) [2023-01-04T14:15:10Z] <XioNoX> fix inconsistent mtu on mr1-esams - T315838

Mentioned in SAL (#wikimedia-operations) [2023-01-04T14:22:12Z] <XioNoX> fix inconsistent mtu on mr1-eqsin - T315838

Mentioned in SAL (#wikimedia-operations) [2023-01-04T14:27:15Z] <XioNoX> fix inconsistent mtu on mr1-codfw - T315838

Mentioned in SAL (#wikimedia-operations) [2023-01-04T14:32:21Z] <XioNoX> fix inconsistent mtu on mr1-eqiad - T315838

Mentioned in SAL (#wikimedia-operations) [2023-01-04T14:42:29Z] <XioNoX> fix inconsistent mtu betwen cr1-eqiad<->lsw1-f1 - T315838

Change 875321 had a related patch set uploaded (by Ayounsi; author: Ayounsi):

[operations/software/netbox-extras@master] test_mtu: ignore frack + report everything else as failure

https://gerrit.wikimedia.org/r/875321

Change 875321 merged by jenkins-bot:

[operations/software/netbox-extras@master] test_mtu: ignore frack + report everything else as failure

https://gerrit.wikimedia.org/r/875321

Last ones are the Fundraising Infrastructure related links (between cr, pfw and fasw). As most of them are not managed by Netbox, I ignored the fr-tech tenant from the report. And also made everything left alert as "failures".

Only left are the links between cr1/2-eqiad and pfw3-eqiad, I'll use the opportunity from T316542: Upgrade fasw to Junos 21 to fix them (codfw is already done).

@Dwisehaupt please let me know if that's ok to schedule it during the Jan 23-27 maintenance window. Impact should be a few packets lost at worse. codfw went fine.

@ayounsi That window is perfect. I'll add it to our list for the week to make sure we don't forget it.