Page MenuHomePhabricator

Enable IPIP encapsulation for ncredir
Closed, ResolvedPublic

Description

We now have all the pieces needed to use IPIP encapsulation with load balanced services. A good guinea pig to test it could be ncredir. The following things are required to enable IPIP encapsulation there:

  • update pybal
  • deploy ipip-multiqueue-optimizer on LVS instances
  • deploy tcp-mss-clamper on ncredir instances
  • Allow inbound IPIP traffic on ncredir instances (T352143)
  • Disable rp filter

Enabled in:

  • eqiad
  • codfw
  • esams
  • ulsfo
  • eqsin
  • drmrs

Details

SubjectRepoBranchLines +/-
operations/puppetproduction+10 -6
operations/alertsmaster+5 -4
operations/alertsmaster+56 -0
operations/puppetproduction+2 -18
operations/puppetproduction+7 -0
operations/puppetproduction+1 -0
operations/puppetproduction+2 -0
operations/puppetproduction+5 -0
operations/puppetproduction+1 -0
operations/puppetproduction+2 -0
operations/puppetproduction+5 -0
operations/puppetproduction+1 -0
operations/puppetproduction+2 -0
operations/puppetproduction+5 -0
operations/puppetproduction+1 -0
operations/puppetproduction+2 -0
operations/puppetproduction+117 -24
operations/puppetproduction+147 -1
operations/puppetproduction+1 -0
operations/puppetproduction+7 -0
operations/puppetproduction+3 -0
operations/puppetproduction+5 -9
operations/puppetproduction+0 -2
operations/puppetproduction+4 -0
operations/puppetproduction+2 -2
operations/puppetproduction+5 -0
operations/puppetproduction+2 -2
operations/puppetproduction+6 -0
operations/puppetproduction+67 -6
operations/puppetproduction+96 -1
operations/puppetproduction+26 -0
operations/puppetproduction+27 -1
operations/puppetproduction+44 -6
operations/puppetproduction+113 -4
operations/puppetproduction+31 -8
operations/software/spicerackmaster+3 -0
operations/puppetproduction+30 -8
Show related patches Customize query in gerrit

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Mentioned in SAL (#wikimedia-operations) [2023-11-21T16:41:45Z] <vgutierrez@cumin1001> END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on P{lvs[1017-1019].eqiad.wmnet} and A:lvs (T351069)

Change 974623 merged by Vgutierrez:

[operations/puppet@production] pybal,wmflib::service: Add ipip_encapsulation flag on lvs

https://gerrit.wikimedia.org/r/974623

Change 975253 merged by Vgutierrez:

[operations/puppet@production] interface: Allow creating IPIP interfaces w/o an endpoint

https://gerrit.wikimedia.org/r/975253

Change 975324 merged by Vgutierrez:

[operations/puppet@production] interface: Add a clsact helper

https://gerrit.wikimedia.org/r/975324

Mentioned in SAL (#wikimedia-operations) [2023-11-22T09:53:27Z] <vgutierrez> rolling restart of pybal to catch up on a NOOP config update - T351069

Mentioned in SAL (#wikimedia-operations) [2023-11-22T09:53:52Z] <vgutierrez@cumin1001> START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs (T351069)

Mentioned in SAL (#wikimedia-operations) [2023-11-22T10:21:46Z] <vgutierrez@cumin1001> END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs (T351069)

Change 976737 had a related patch set uploaded (by Vgutierrez; author: Vgutierrez):

[operations/puppet@production] lvs: Deploy ipip-multiqueue-optimizer for IPIP enabled balancers

https://gerrit.wikimedia.org/r/976737

Change 977046 had a related patch set uploaded (by Vgutierrez; author: Vgutierrez):

[operations/puppet@production] interface::manual: Fix absensting

https://gerrit.wikimedia.org/r/977046

Change 977056 had a related patch set uploaded (by Vgutierrez; author: Vgutierrez):

[operations/puppet@production] interface::ipip: Fix absenting

https://gerrit.wikimedia.org/r/977056

Change 977046 merged by Vgutierrez:

[operations/puppet@production] interface::manual: Fix absenting

https://gerrit.wikimedia.org/r/977046

Change 977056 merged by Vgutierrez:

[operations/puppet@production] interface::ipip: Fix absenting

https://gerrit.wikimedia.org/r/977056

@ayounsi what would be the required TCP MSS clamping values? per https://phabricator.wikimedia.org/T348837#9256494 It seems that around ~1400 bytes for both IPv4/IPv6 should be ok?

That's a great question. I don't think we have the resources to do an extensive investigation.

I see 2 options:

  1. either we only subtract the tunnel header from the default MSS to get the most data out of each packets
    • Fortunately we're not heavy inbound, so it's the size of the outbound packet that matter the most.
  2. or we decrease it further to try to fix the same issue larger providers investigated (clients with tunnels and odd settings)
    • this is slightly off topic, but maybe worth doing as we're there anyway. But the value will be an approximation (eg. based on what others are doing, round MTU or MSS value for ease of troubleshooting)

thx @ayounsi we will go with option 1:

  • IPv4: 1500 - 20 (IP) - 20 (IP) - 20 (TCP) = 1440 bytes
  • IPv6: 1500 - 40 (IPv6) - 40 (IPv6) - 20 (TCP) = 1400 bytes

Change 977696 had a related patch set uploaded (by Vgutierrez; author: Vgutierrez):

[operations/puppet@production] lvs::realserver::ipip: Check that TCP MSS clamping is working

https://gerrit.wikimedia.org/r/977696

Change 975342 merged by Vgutierrez:

[operations/puppet@production] profile: Provide a lvs::realserver::ipip profile

https://gerrit.wikimedia.org/r/975342

Change 976737 merged by Vgutierrez:

[operations/puppet@production] lvs,pybal: Deploy ipip-multiqueue-optimizer for IPIP enabled balancers

https://gerrit.wikimedia.org/r/976737

Change 975772 merged by Vgutierrez:

[operations/puppet@production] ncredir: Enable IPIP encapsulation on ulsfo

https://gerrit.wikimedia.org/r/975772

Change 977736 had a related patch set uploaded (by Vgutierrez; author: Vgutierrez):

[operations/puppet@production] interface::clsact: Fix unless cmd

https://gerrit.wikimedia.org/r/977736

Change 977736 merged by Vgutierrez:

[operations/puppet@production] interface::clsact: Fix unless cmd

https://gerrit.wikimedia.org/r/977736

Change 977746 had a related patch set uploaded (by Vgutierrez; author: Vgutierrez):

[operations/puppet@production] hiera: Enable IPIP on ulsfo LVS

https://gerrit.wikimedia.org/r/977746

Mentioned in SAL (#wikimedia-operations) [2023-11-27T17:52:21Z] <vgutierrez> upload ipip-multiqueue-optimizer 0.2 to apt.wm.o (bullseye) - T351069

Change 977746 merged by Vgutierrez:

[operations/puppet@production] hiera: Enable IPIP on ulsfo text|secondary LVS

https://gerrit.wikimedia.org/r/977746

Change 977761 had a related patch set uploaded (by Vgutierrez; author: Vgutierrez):

[operations/puppet@production] profile::lvs: Fix ipip-multiqueue-optimizer systemd unit

https://gerrit.wikimedia.org/r/977761

Change 977761 merged by Vgutierrez:

[operations/puppet@production] profile::lvs: Fix ipip-multiqueue-optimizer systemd unit

https://gerrit.wikimedia.org/r/977761

Mentioned in SAL (#wikimedia-operations) [2023-11-27T18:07:25Z] <vgutierrez> restarting pybal on lvs4010 - T351069

Change 977764 had a related patch set uploaded (by Vgutierrez; author: Vgutierrez):

[operations/puppet@production] service: Enable IPIP encapsulation for ncredir-https too

https://gerrit.wikimedia.org/r/977764

Change 977764 merged by Vgutierrez:

[operations/puppet@production] service: Enable IPIP encapsulation for ncredir-https too

https://gerrit.wikimedia.org/r/977764

Mentioned in SAL (#wikimedia-operations) [2023-11-27T19:50:17Z] <vgutierrez> restarting pybal on lvs4010 - T351069

Mentioned in SAL (#wikimedia-operations) [2023-11-27T19:53:19Z] <vgutierrez> restarting pybal on lvs4008 (effectively enabling IPIP encapsulation on ncredir@ulsfo) - T351069

Change 977782 had a related patch set uploaded (by Vgutierrez; author: Vgutierrez):

[operations/puppet@production] service: Disable IPIP encapsulation for ncredir@ulsfo

https://gerrit.wikimedia.org/r/977782

Change 977782 merged by Vgutierrez:

[operations/puppet@production] service: Disable IPIP encapsulation for ncredir@ulsfo

https://gerrit.wikimedia.org/r/977782

Mentioned in SAL (#wikimedia-operations) [2023-11-27T20:16:14Z] <vgutierrez> rolling restart of pybal on lvs4010 and lvs4008 - T351069

Mentioned in SAL (#wikimedia-operations) [2023-11-28T10:09:24Z] <vgutierrez> rolling restart of pybal on lvs4010 and lvs4008, effectively enabling IPIP encapsulation on ncredir@ulsfo - T351069

Mentioned in SAL (#wikimedia-operations) [2023-11-28T10:21:19Z] <vgutierrez> rolling restart of pybal on lvs4010 and lvs4008, effectively disabling IPIP encapsulation on ncredir@ulsfo - T351069

Mentioned in SAL (#wikimedia-operations) [2023-11-29T12:25:38Z] <vgutierrez> rolling restart of pybal on lvs4008 and lvs4010, effectively enabling IPIP encapsulation for ncredir@ulsfo - T351069

Change 978608 had a related patch set uploaded (by Vgutierrez; author: Vgutierrez):

[operations/puppet@production] prometheus::ops: Fix lvs_realserver_clamper config

https://gerrit.wikimedia.org/r/978608

Change 978608 merged by Vgutierrez:

[operations/puppet@production] prometheus::ops: Fix lvs_realserver_clamper config

https://gerrit.wikimedia.org/r/978608

Change 978624 had a related patch set uploaded (by Vgutierrez; author: Vgutierrez):

[operations/puppet@production] ncredir: Enable IPIP encapsulation on codfw

https://gerrit.wikimedia.org/r/978624

Change 978625 had a related patch set uploaded (by Vgutierrez; author: Vgutierrez):

[operations/puppet@production] hiera: Enable IPIP on codfw text|secondary LVS

https://gerrit.wikimedia.org/r/978625

Change 978624 merged by Vgutierrez:

[operations/puppet@production] ncredir: Enable IPIP encapsulation on codfw

https://gerrit.wikimedia.org/r/978624

Change 978625 merged by Vgutierrez:

[operations/puppet@production] hiera: Enable IPIP on codfw text|secondary LVS

https://gerrit.wikimedia.org/r/978625

Mentioned in SAL (#wikimedia-operations) [2023-11-30T09:59:19Z] <vgutierrez> rolling restart of pybal on lvs2011 and lvs2014, effectively enabling IPIP encapsulation on ncredir@codfw - T351069

Change 979297 had a related patch set uploaded (by Vgutierrez; author: Vgutierrez):

[operations/puppet@production] prometheus::sysctl: Support configurable sysctls

https://gerrit.wikimedia.org/r/979297

Change 979903 had a related patch set uploaded (by Vgutierrez; author: Vgutierrez):

[operations/puppet@production] lvs::realserver::ipip: Clamp on lo too

https://gerrit.wikimedia.org/r/979903

Change 979903 merged by Vgutierrez:

[operations/puppet@production] lvs::realserver::ipip: Clamp on lo too

https://gerrit.wikimedia.org/r/979903

Change 977696 merged by Vgutierrez:

[operations/puppet@production] lvs::realserver::ipip: Check that TCP MSS clamping is working

https://gerrit.wikimedia.org/r/977696

Change 979984 had a related patch set uploaded (by Vgutierrez; author: Vgutierrez):

[operations/puppet@production] hiera: Disable rp_filter on ncredir@eqsin

https://gerrit.wikimedia.org/r/979984

Change 979985 had a related patch set uploaded (by Vgutierrez; author: Vgutierrez):

[operations/puppet@production] hiera: Enable IPIP encapsulation on ncredir@eqsin

https://gerrit.wikimedia.org/r/979985

Change 979297 merged by Vgutierrez:

[operations/puppet@production] prometheus::sysctl: Support configurable sysctls

https://gerrit.wikimedia.org/r/979297

Change 979984 merged by Vgutierrez:

[operations/puppet@production] hiera: Disable rp_filter on ncredir@eqsin

https://gerrit.wikimedia.org/r/979984

Change 979994 had a related patch set uploaded (by Vgutierrez; author: Vgutierrez):

[operations/puppet@production] hiera: Enable IPIP on eqsin text|secondary LVS

https://gerrit.wikimedia.org/r/979994

Change 979985 merged by Vgutierrez:

[operations/puppet@production] hiera: Enable IPIP encapsulation on ncredir@eqsin

https://gerrit.wikimedia.org/r/979985

Change 979994 merged by Vgutierrez:

[operations/puppet@production] hiera: Enable IPIP on eqsin text|secondary LVS

https://gerrit.wikimedia.org/r/979994

Mentioned in SAL (#wikimedia-operations) [2023-12-05T06:55:11Z] <vgutierrez> rolling restart of text|secondary LVS on eqsin effectively enabling IPIP encapsulation for ncredir@eqsin - T351069

Change 980272 had a related patch set uploaded (by Vgutierrez; author: Vgutierrez):

[operations/puppet@production] hiera: Disable rp_filter for ncredir@drmrs

https://gerrit.wikimedia.org/r/980272

Change 980273 had a related patch set uploaded (by Vgutierrez; author: Vgutierrez):

[operations/puppet@production] hiera: Enable IPIP encapsulation on ncredir@drmrs

https://gerrit.wikimedia.org/r/980273

Change 980274 had a related patch set uploaded (by Vgutierrez; author: Vgutierrez):

[operations/puppet@production] hiera: Enable IPIP on text|secondary LVS in drmrs

https://gerrit.wikimedia.org/r/980274

Change 980280 had a related patch set uploaded (by Vgutierrez; author: Vgutierrez):

[operations/alerts@master] traffic: Alert on configured and observed MSS mismatch

https://gerrit.wikimedia.org/r/980280

Change 980272 merged by Vgutierrez:

[operations/puppet@production] hiera: Disable rp_filter for ncredir@drmrs

https://gerrit.wikimedia.org/r/980272

Change 980273 merged by Vgutierrez:

[operations/puppet@production] hiera: Enable IPIP encapsulation on ncredir@drmrs

https://gerrit.wikimedia.org/r/980273

Change 980274 merged by Vgutierrez:

[operations/puppet@production] hiera: Enable IPIP on text|secondary LVS in drmrs

https://gerrit.wikimedia.org/r/980274

Mentioned in SAL (#wikimedia-operations) [2023-12-05T17:45:59Z] <vgutierrez> rolling restart of text|secondary LVS on drmrs effectively enabling IPIP encapsulation for ncredir@drmrs- T351069

Change 981955 had a related patch set uploaded (by Vgutierrez; author: Vgutierrez):

[operations/puppet@production] hiera: Disable rp_filter for ncredir@esams

https://gerrit.wikimedia.org/r/981955

Change 982038 had a related patch set uploaded (by Vgutierrez; author: Vgutierrez):

[operations/puppet@production] hiera: Enable IPIP encapsulation on ncredir@esams

https://gerrit.wikimedia.org/r/982038

Change 982040 had a related patch set uploaded (by Vgutierrez; author: Vgutierrez):

[operations/puppet@production] hiera: Enable IPIP encapsulation on text|secondary LVS in esams

https://gerrit.wikimedia.org/r/982040

Change 981955 merged by Vgutierrez:

[operations/puppet@production] hiera: Disable rp_filter for ncredir@esams

https://gerrit.wikimedia.org/r/981955

Change 982038 merged by Vgutierrez:

[operations/puppet@production] hiera: Enable IPIP encapsulation on ncredir@esams

https://gerrit.wikimedia.org/r/982038

Change 982040 merged by Vgutierrez:

[operations/puppet@production] hiera: Enable IPIP encapsulation on text|secondary LVS in esams

https://gerrit.wikimedia.org/r/982040

Mentioned in SAL (#wikimedia-operations) [2023-12-11T11:20:03Z] <vgutierrez> rolling restart of pybal on lvs3010 and lvs3008 effectively enabling IPIP encapsulation on ncredir@esams - T351069

Change 982063 had a related patch set uploaded (by Vgutierrez; author: Vgutierrez):

[operations/puppet@production] hiera: Disable rp_filter on ncredir@eqiad

https://gerrit.wikimedia.org/r/982063

Change 982070 had a related patch set uploaded (by Vgutierrez; author: Vgutierrez):

[operations/puppet@production] hiera: Enable IPIP encapsulation on ncredir@eqiad

https://gerrit.wikimedia.org/r/982070

Change 982096 had a related patch set uploaded (by Vgutierrez; author: Vgutierrez):

[operations/puppet@production] hiera: Enable IPIP encapsulation on text|secondary LVS in eqiad

https://gerrit.wikimedia.org/r/982096

Change 982063 merged by Vgutierrez:

[operations/puppet@production] hiera: Disable rp_filter on ncredir@eqiad

https://gerrit.wikimedia.org/r/982063

Change 982070 merged by Vgutierrez:

[operations/puppet@production] hiera: Enable IPIP encapsulation on ncredir@eqiad

https://gerrit.wikimedia.org/r/982070

Change 982096 merged by Vgutierrez:

[operations/puppet@production] hiera: Enable IPIP encapsulation on text|secondary LVS in eqiad

https://gerrit.wikimedia.org/r/982096

Mentioned in SAL (#wikimedia-operations) [2023-12-11T16:13:21Z] <vgutierrez> rolling restart of pybal on lvs1020 and lvs1017 effectively enabling IPIP encapsulation on ncredir@eqiad - T351069

Vgutierrez updated the task description. (Show Details)

Change 982124 had a related patch set uploaded (by Vgutierrez; author: Vgutierrez):

[operations/puppet@production] hiera: Unify ncredir IPIP encapsulation settings

https://gerrit.wikimedia.org/r/982124

Change 982124 merged by Vgutierrez:

[operations/puppet@production] hiera: Unify ncredir IPIP encapsulation settings

https://gerrit.wikimedia.org/r/982124

Change 980280 merged by Vgutierrez:

[operations/alerts@master] traffic: Alert on configured and observed MSS mismatch

https://gerrit.wikimedia.org/r/980280

Change 982808 had a related patch set uploaded (by Vgutierrez; author: Vgutierrez):

[operations/alerts@master] traffic: Provide a dashboard link for LVSRealServerMSS

https://gerrit.wikimedia.org/r/982808

Change 982808 merged by Vgutierrez:

[operations/alerts@master] traffic: Provide a dashboard link for LVSRealServerMSS

https://gerrit.wikimedia.org/r/982808

Change 991785 had a related patch set uploaded (by Ssingh; author: Ssingh):

[operations/puppet@production] P:lvs: set monitoring enabled for IPIP-related services

https://gerrit.wikimedia.org/r/991785

Change 991785 merged by Ssingh:

[operations/puppet@production] P:lvs: set monitoring enabled for IPIP-related services

https://gerrit.wikimedia.org/r/991785