Page MenuHomePhabricator

Race condition in setting net.netfilter.nf_conntrack_tcp_timeout_time_wait
Open, Needs TriagePublic

Description

We're setting the sysctl values "net.netfilter.nf_conntrack_tcp_timeout_time_wait" and "net.netfilter.nf_conntrack_max" in /etc/sysctl.d/70-ferm_conntrack.conf (configured via base::firewall). net.netfilter.nf_conntrack_max" is realiably set, but in 400 out out approx. 1000 systems with that setting, net.netfilter.nf_conntrack_tcp_timeout_time_wait is at the kernel default value of 120. This affects both systems using upstart and systemd and seems to be caused by a race condition, depending on whether the nf_conntrack kernel module is loaded before or after the sysctl value is set.

Details

Related Gerrit Patches:

Event Timeline

Restricted Application added subscribers: Zppix, Aklapper. · View Herald TranscriptMay 24 2016, 3:06 PM

So the problem occurs whenever /etc/sysctl.d/70-ferm_conntrack.conf is processed before ferm has been started (which loads the nf_conntrack kernel module). Before the kernel module is loaded, the sysctl setting is unavailable, which makes sysctl fail and only print

sysctl: cannot stat /proc/sys/net/netfilter/nf_conntrack_tcp_timeout_time_wait: No such file or directory

When the kernel module is loaded later, it uses the default of 120. On jessie sysctl settings are loaded via systemd-sysctl.service which loads all sysctl settings in general from /etc/sysctl.d, so having it depend on ferm.service being up won't really fly.

So it's probably best to set the ferm-related sysctl settings in a script which is run after ferm is started, e.g. by creating a ferm-sysctl.service which depends on ferm.service.

I've written a separate systemd unit ferm-sysctl.service (which is started after ferm itself), which sets the correct values. After some tests on multatuli this works fine. This still needs to be puppetised. And I need to look into whether the problem also happens on trusty.

faidon added a subscriber: faidon.Oct 28 2016, 3:12 PM

Also see T148986, which describes a different boot-time race with ferm.

This also affects trusty hosts. I'll also make the net.netfilter.nf_conntrack_max value configurable via Hiera.

Change 320197 had a related patch set uploaded (by Muehlenhoff):
Load connection tracking sysctl values via a separate systemd unit

https://gerrit.wikimedia.org/r/320197

Change 320590 had a related patch set uploaded (by Muehlenhoff):
Configure connection tracking sysctl settings in ferm

https://gerrit.wikimedia.org/r/320590

Change 320590 abandoned by Muehlenhoff:
Configure connection tracking sysctl settings in ferm

Reason:
That did not work out as expected

https://gerrit.wikimedia.org/r/320590

Change 349193 had a related patch set uploaded (by Muehlenhoff):
[operations/puppet@production] Load nf_conntrack via /etc/modules-load.d/

https://gerrit.wikimedia.org/r/349193

Change 349193 merged by Muehlenhoff:
[operations/puppet@production] Load nf_conntrack via /etc/modules-load.d/

https://gerrit.wikimedia.org/r/349193

Change 349392 had a related patch set uploaded (by Muehlenhoff):
[operations/puppet@production] Load nf_conntrack via /etc/modules-load.d/

https://gerrit.wikimedia.org/r/349392

Change 349392 merged by Muehlenhoff:
[operations/puppet@production] Load nf_conntrack via /etc/modules-load.d/

https://gerrit.wikimedia.org/r/349392

Change 320197 abandoned by Muehlenhoff:
Load connection tracking sysctl values via a separate systemd unit

Reason:
Abandon in favour of https://gerrit.wikimedia.org/r/#/c/319071/ which loads the nf_conntrack module via /etc/modules-load.d

https://gerrit.wikimedia.org/r/320197

MoritzMuehlenhoff closed this task as Resolved.Apr 21 2017, 11:24 AM

That's now fixed by loading the nf_conntrack module via /etc/modules-load.d (which is done before systemd-sysctl.service runs), which fixes the race.

Mentioned in SAL (#wikimedia-operations) [2017-04-29T10:50:18Z] <elukey> set sysctl -w net.netfilter.nf_conntrack_tcp_timeout_time_wait=65 to kafka[1018,1020,1022].eqiad.wmnet (was 120 - maybe related to T136094 ?)

MoritzMuehlenhoff reopened this task as Open.May 5 2017, 2:43 PM

Despite what's documented in the sysctl.d(5) manpage, this does not fix the race; kafka1018 was rebooted two hours ago, has nf_conntrack loaded via /etc/modules-load.d/conntrack.conf, but the sysctl value still isn't correctly applied.

The modules-load.d approach mentioned in sysctl.d isn't sufficiently race-free: While systemd-sysctl.service has a "After: systemd-modules-load", systemd-modules-load only initiates the loading of the kernel modules via kmod, but doesn't wait until the modules are loaded. For confirmation I've run both service units in debug mode:

May 29 15:05:54 multatuli systemd-modules-load[244]: apply: /etc/modules-load.d/conntrack.conf
May 29 15:05:54 multatuli systemd-modules-load[244]: load: nf_conntrack
May 29 15:05:54 multatuli systemd-modules-load[244]: Inserted module 'nf_conntrack'
May 29 15:05:54 multatuli systemd-modules-load[244]: apply: /etc/modules-load.d/modules.conf
May 29 15:05:54 multatuli systemd-sysctl[250]: parse: /etc/sysctl.d/10-ubuntu-defaults.conf
May 29 15:05:54 multatuli systemd-sysctl[250]: parse: /etc/sysctl.d/60-wikimedia-base.conf
May 29 15:05:54 multatuli systemd-sysctl[250]: parse: /etc/sysctl.d/70-core_dumps.conf
May 29 15:05:54 multatuli systemd-sysctl[250]: parse: /etc/sysctl.d/70-disable_unprivileged_bpf.conf
May 29 15:05:54 multatuli systemd-sysctl[250]: parse: /etc/sysctl.d/70-ferm_conntrack.conf
May 29 15:05:54 multatuli systemd-sysctl[250]: Setting 'fs/protected_hardlinks' to '1'
(..)
May 29 15:05:54 multatuli systemd-sysctl[250]: Setting 'net/netfilter/nf_conntrack_max' to '262144'
May 29 15:05:54 multatuli systemd-sysctl[250]: Setting 'net/netfilter/nf_conntrack_tcp_timeout_time_wait' to '65'
May 29 15:05:54 multatuli systemd-sysctl[250]: Failed to write '65' to '/proc/sys/net/netfilter/nf_conntrack_tcp_timeout_time_wait': No such file or directory

Mentioned in SAL (#wikimedia-operations) [2017-08-07T09:06:33Z] <elukey> set net.netfilter.nf_conntrack_tcp_timeout_time_wait=65 (was 120) on all the analytics kafka brokers - T136094

elukey moved this task from Backlog to Keep an eye on it on the User-Elukey board.Aug 9 2017, 10:35 AM

Mentioned in SAL (#wikimedia-operations) [2017-10-24T15:39:16Z] <elukey> set net.netfilter.nf_conntrack_tcp_timeout_time_wait=65 to mw[1308-1311] - T136094