Page MenuHomePhabricator

nf_conntrack warnings for kafka hosts
Closed, ResolvedPublic

Description

A lot of warnings in the past days for:

19:07  <icinga-wm> PROBLEM - Check size of conntrack table on kafka1014 is CRITICAL: CRITICAL: nf_conntrack is 90 % full
19:07  <icinga-wm> PROBLEM - Check size of conntrack table on kafka1012 is CRITICAL: CRITICAL: nf_conntrack is 90 % full
19:08  <icinga-wm> PROBLEM - Check size of conntrack table on kafka1013 is CRITICAL: CRITICAL: nf_conntrack is 90 % full
19:08  <icinga-wm> PROBLEM - Check size of conntrack table on kafka1022 is CRITICAL: CRITICAL: nf_conntrack is 90 % full
19:08  <icinga-wm> PROBLEM - Check size of conntrack table on kafka1018 is CRITICAL: CRITICAL: nf_conntrack is 90 % full
19:09  <icinga-wm> PROBLEM - Check size of conntrack table on kafka1020 is CRITICAL: CRITICAL: nf_conntrack is 90 % full
...
...

On one kafka host:

elukey@kafka1020:~$ sudo sysctl net.netfilter.nf_conntrack_max
net.netfilter.nf_conntrack_max = 262144
elukey@kafka1020:~$ sudo sysctl net.netfilter.nf_conntrack_count
net.netfilter.nf_conntrack_count = 233136

Output of less /proc/net/nf_conntrack:

....
....
ipv4     2 tcp      6 73 TIME_WAIT src=10.64.16.127 dst=10.64.53.12 sport=47835 dport=9092 src=10.64.53.12 dst=10.64.16.127 sport=9092 dport=47835 [ASSURED] mark=0 zone=0 use=2
ipv4     2 tcp      6 51 TIME_WAIT src=10.64.48.33 dst=10.64.53.12 sport=42706 dport=9092 src=10.64.53.12 dst=10.64.48.33 sport=9092 dport=42706 [ASSURED] mark=0 zone=0 use=2
ipv4     2 tcp      6 37 TIME_WAIT src=10.64.32.65 dst=10.64.53.12 sport=38930 dport=9092 src=10.64.53.12 dst=10.64.32.65 sport=9092 dport=38930 [ASSURED] mark=0 zone=0 use=2
ipv4     2 tcp      6 17 TIME_WAIT src=10.64.16.125 dst=10.64.53.12 sport=47311 dport=9092 src=10.64.53.12 dst=10.64.16.125 sport=9092 dport=47311 [ASSURED] mark=0 zone=0 use=2
ipv4     2 tcp      6 16 TIME_WAIT src=10.64.48.62 dst=10.64.53.12 sport=59539 dport=9092 src=10.64.53.12 dst=10.64.48.62 sport=9092 dport=59539 [ASSURED] mark=0 zone=0 use=2
ipv4     2 tcp      6 12 TIME_WAIT src=10.64.48.63 dst=10.64.53.12 sport=38423 dport=9092 src=10.64.53.12 dst=10.64.48.63 sport=9092 dport=38423 [ASSURED] mark=0 zone=0 use=2
ipv4     2 tcp      6 110 TIME_WAIT src=10.64.32.64 dst=10.64.53.12 sport=48972 dport=9092 src=10.64.53.12 dst=10.64.32.64 sport=9092 dport=48972 [ASSURED] mark=0 zone=0 use=2
ipv4     2 tcp      6 69 TIME_WAIT src=10.64.16.96 dst=10.64.53.12 sport=37687 dport=9092 src=10.64.53.12 dst=10.64.16.96 sport=9092 dport=37687 [ASSURED] mark=0 zone=0 use=2
...
...

Most of the sockets in TIME_WAITs are for connections between mwXXXX and kafka. Netstat shows a different result (-tuap), a lot of ESTABLISHED connections between kafka hosts and with cpXXXX (as expected).

This is only affecting the Analytics cluster, not the EventBus one. The risk is dropping packets with when nf_conntrack reaches 100% utilization.

Quick solution would be to increase the conntrack_max value with sysctl (like https://gerrit.wikimedia.org/r/#/c/278290/1) but it would be also great to figure out what changed.

Event Timeline

We can easily bump that to 512k, we had the same workaround for the recent
job runner problems. We can than properly analyse the root cause on Tuesday.
Am 27.03.2016 20:33 schrieb "elukey" <no-reply@phabricator.wikimedia.org>:

elukey created this task.
Herald added a subscriber: Aklapper.

TASK DESCRIPTION

A lot of warnings in the past days for:

  19:07  <icinga-wm> PROBLEM - Check size of conntrack table on

kafka1014 is CRITICAL: CRITICAL: nf_conntrack is 90 % full

19:07  <icinga-wm> PROBLEM - Check size of conntrack table on

kafka1012 is CRITICAL: CRITICAL: nf_conntrack is 90 % full

19:08  <icinga-wm> PROBLEM - Check size of conntrack table on

kafka1013 is CRITICAL: CRITICAL: nf_conntrack is 90 % full

19:08  <icinga-wm> PROBLEM - Check size of conntrack table on

kafka1022 is CRITICAL: CRITICAL: nf_conntrack is 90 % full

19:08  <icinga-wm> PROBLEM - Check size of conntrack table on

kafka1018 is CRITICAL: CRITICAL: nf_conntrack is 90 % full

19:09  <icinga-wm> PROBLEM - Check size of conntrack table on

kafka1020 is CRITICAL: CRITICAL: nf_conntrack is 90 % full

  ...
  ...

On one kafka host:

  elukey@kafka1020:~$ sudo sysctl net.netfilter.nf_conntrack_max
  net.netfilter.nf_conntrack_max = 262144
  elukey@kafka1020:~$ sudo sysctl net.netfilter.nf_conntrack_count
  net.netfilter.nf_conntrack_count = 233136

Output of less /proc/net/nf_conntrack:

  ....
  ....
  ipv4     2 tcp      6 73 TIME_WAIT src=10.64.16.127 dst=10.64.53.12

sport=47835 dport=9092 src=10.64.53.12 dst=10.64.16.127 sport=9092
dport=47835 [ASSURED] mark=0 zone=0 use=2

ipv4     2 tcp      6 51 TIME_WAIT src=10.64.48.33 dst=10.64.53.12

sport=42706 dport=9092 src=10.64.53.12 dst=10.64.48.33 sport=9092
dport=42706 [ASSURED] mark=0 zone=0 use=2

ipv4     2 tcp      6 37 TIME_WAIT src=10.64.32.65 dst=10.64.53.12

sport=38930 dport=9092 src=10.64.53.12 dst=10.64.32.65 sport=9092
dport=38930 [ASSURED] mark=0 zone=0 use=2

ipv4     2 tcp      6 17 TIME_WAIT src=10.64.16.125 dst=10.64.53.12

sport=47311 dport=9092 src=10.64.53.12 dst=10.64.16.125 sport=9092
dport=47311 [ASSURED] mark=0 zone=0 use=2

ipv4     2 tcp      6 16 TIME_WAIT src=10.64.48.62 dst=10.64.53.12

sport=59539 dport=9092 src=10.64.53.12 dst=10.64.48.62 sport=9092
dport=59539 [ASSURED] mark=0 zone=0 use=2

ipv4     2 tcp      6 12 TIME_WAIT src=10.64.48.63 dst=10.64.53.12

sport=38423 dport=9092 src=10.64.53.12 dst=10.64.48.63 sport=9092
dport=38423 [ASSURED] mark=0 zone=0 use=2

ipv4     2 tcp      6 110 TIME_WAIT src=10.64.32.64 dst=10.64.53.12

sport=48972 dport=9092 src=10.64.53.12 dst=10.64.32.64 sport=9092
dport=48972 [ASSURED] mark=0 zone=0 use=2

ipv4     2 tcp      6 69 TIME_WAIT src=10.64.16.96 dst=10.64.53.12

sport=37687 dport=9092 src=10.64.53.12 dst=10.64.16.96 sport=9092
dport=37687 [ASSURED] mark=0 zone=0 use=2

  ...
  ...

Most of the sockets in TIME_WAITs are for connections between mwXXXX and

kafka. Netstat shows a different result (-tuap), a lot of ESTABLISHED
connections between kafka hosts and with cpXXXX (as expected).

This is only affecting the Analytics cluster, not the EventBus one. The

risk is dropping packets with when nf_conntrack reaches 100% utilization.

Quick solution would be to increase the conntrack_max value with sysctl

(like https://gerrit.wikimedia.org/r/#/c/278290/1) but it would be also
great to figure out what changed.

TASK DETAIL

https://phabricator.wikimedia.org/T131028

EMAIL PREFERENCES

https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: elukey
Cc: MoritzMuehlenhoff, Ottomata, Aklapper, elukey, TerraCodes,
JAllemandou, Jay8g, jeremyb

Change 279776 had a related patch set uploaded (by Elukey):
Bump nf_conntrack_max temporarily to allow proper investigation.

https://gerrit.wikimedia.org/r/279776

Change 279776 merged by Elukey:
Bump nf_conntrack_max temporarily to allow proper investigation.

https://gerrit.wikimedia.org/r/279776

ipv4     2 tcp      6 73 TIME_WAIT src=10.64.16.127 dst=10.64.53.12

TIME_WAIT is a state for the TCP actor that does the active close, in this case the src mw1147.eqiad.wmnet. This is why I didn't see any TIME_WAITs in netstat -tunap on kafka hosts.

Last week, ApiAction Kafka logging was turned on from MW. I'm not sure how much this would have increased the number of connections from MW. Is it possible that this increase in connection conntrack_count is normal and expected? There are a lot of clients talking with analytics Kafka brokers: all caches and all app servers, and periodically many Hadoop nodes.

Connection rates are back to normal, around 115k on all kafka1* hosts.

elukey@neodymium:~$ sudo salt kafka10*.eqiad.wmnet cmd.run 'sysctl net.netfilter.nf_conntrack_count'
kafka1012.eqiad.wmnet:
    net.netfilter.nf_conntrack_count = 134499
kafka1002.eqiad.wmnet:
    net.netfilter.nf_conntrack_count = 1769
kafka1020.eqiad.wmnet:
    net.netfilter.nf_conntrack_count = 134112
kafka1018.eqiad.wmnet:
    net.netfilter.nf_conntrack_count = 134508
kafka1001.eqiad.wmnet:
    net.netfilter.nf_conntrack_count = 1830
kafka1022.eqiad.wmnet:
    net.netfilter.nf_conntrack_count = 134734
kafka1013.eqiad.wmnet:
    net.netfilter.nf_conntrack_count = 134418
kafka1014.eqiad.wmnet:
    net.netfilter.nf_conntrack_count = 133684

The main doubt in here is if there are chances of receiving connection bursts or not. What triggered the increased connection rate?

Yeah, if it went back down, then it isn't related to the ApiAction logging in MW. That is still happening.

Change 280396 had a related patch set uploaded (by Elukey):
Add netfliter nf_conntrack diamond collector to kafka brokers.

https://gerrit.wikimedia.org/r/280396

Change 280396 merged by Elukey:
Add netfliter nf_conntrack diamond collector to kafka brokers.

https://gerrit.wikimedia.org/r/280396

Change 280622 had a related patch set uploaded (by Elukey):
Revert "Bump nf_conntrack_max temporarily to allow proper investigation."

https://gerrit.wikimedia.org/r/280622

Change 280622 abandoned by Elukey:
Revert "Bump nf_conntrack_max temporarily to allow proper investigation."

Reason:
Will re-send another code review

https://gerrit.wikimedia.org/r/280622

Change 280623 had a related patch set uploaded (by Elukey):
Restore nf_conntrack_max setting for the analytics kafka brokers.

https://gerrit.wikimedia.org/r/280623

Change 280623 merged by Elukey:
Restore nf_conntrack_max setting for the analytics kafka brokers.

https://gerrit.wikimedia.org/r/280623

As stated in the commit message:

"""
We added a graphite metric (using a diamond script) to track
nf_conntrack_count and observed the kafka hosts behavior over
the past days. The count is back to ~130K as expected,
and we verified that the previous spikes were related to a surge
of TIME_WAITs flows (src mw hosts, dst kafka hosts).
Now that we have visibility and we established that previous
nf_conntrack_count levels were abnormal, we can remove this temporarily measure.
"""

I am inclined to close this task and investigate further recurrences of the problem if they will arise.