Page MenuHomePhabricator

Investigate outgoing discarded packets in the codfw kubernetes cluster
Open, Stalled, LowPublic

Description

We 've recently noticed that the kubernetes codfw cluster has a steady rate of outdoing discarded packets. While not alarming, the very steady rate[1] of these errors is peculiar and warrants more investigation.

As per the graph below

Screenshot from 2019-06-20 16-36-39.png (1×1 px, 140 KB)

the error rate is ~5 discarded packets per sec. These are outgoing packets. Digging into it some more it does seem that these are ICMP redirects.

We already found a bug in sysstat trying to follow up on this. See https://github.com/sysstat/sysstat/pull/226
It's worth noting that kubernetes

[1] https://grafana.wikimedia.org/d/000000366/network-performances-global

Related Objects

Event Timeline

https://grafana.wikimedia.org/d/PRA2F67Zz/t226237?orgId=1 was created to help debug with this. It makes more clear that this are indeed outgoing ICMP redirects

After some mangling with iptables trying to figure out what is going on I 've managed to capture these packets (and their drops?) in iptables and log them

Jun 21 10:27:00 kubernetes2001 kernel: [2761869.040301] IN= OUT=eno1 SRC=10.192.0.11 DST=10.192.0.71 LEN=88 TOS=0x00 PREC=0xC0 TTL=64 ID=65274 PROTO=ICMP TYPE=5 CODE=1 GATEWAY=10.192.0.1 [SRC=10.192.0.71 DST=10.192.64.76 LEN=60 TOS=0x00 PREC=0x00 TTL=62 ID=40517 DF PROTO=TCP SPT=40224 DPT=8192 WINDOW=29200 RES=0x00 SYN URGP=0 ] 
Jun 21 10:27:00 kubernetes2001 kernel: [2761869.084820] IN= OUT=eno1 SRC=10.192.0.11 DST=10.192.0.43 LEN=88 TOS=0x00 PREC=0xC0 TTL=64 ID=58467 PROTO=ICMP TYPE=5 CODE=1 GATEWAY=10.192.0.1 [SRC=10.192.0.43 DST=10.192.64.158 LEN=60 TOS=0x00 PREC=0x00 TTL=62 ID=48860 DF PROTO=TCP SPT=50202 DPT=8192 WINDOW=29200 RES=0x00 SYN URGP=0 ] 
Jun 21 10:27:00 kubernetes2001 kernel: [2761869.114766] IN= OUT=eno1 SRC=10.192.0.11 DST=10.192.0.47 LEN=88 TOS=0x00 PREC=0xC0 TTL=64 ID=59798 PROTO=ICMP TYPE=5 CODE=1 GATEWAY=10.192.0.1 [SRC=10.192.0.47 DST=10.192.64.64 LEN=60 TOS=0x00 PREC=0x00 TTL=62 ID=61492 DF PROTO=TCP SPT=60076 DPT=8192 WINDOW=29200 RES=0x00 SYN URGP=0 ] 
Jun 21 10:27:01 kubernetes2001 CRON[33673]: (prometheus) CMD (/usr/local/bin/prometheus-puppet-agent-stats --outfile /var/lib/prometheus/node.d/puppet_agent.prom)
Jun 21 10:27:02 kubernetes2001 kernel: [2761871.333254] IN= OUT=eno1 SRC=10.192.0.11 DST=10.192.0.82 LEN=88 TOS=0x00 PREC=0xC0 TTL=64 ID=39837 PROTO=ICMP TYPE=5 CODE=1 GATEWAY=10.192.0.1 [SRC=10.192.0.82 DST=10.192.64.209 LEN=60 TOS=0x00 PREC=0x00 TTL=62 ID=30743 DF PROTO=TCP SPT=35850 DPT=8192 WINDOW=29200 RES=0x00 SYN URGP=0 ] 
Jun 21 10:27:02 kubernetes2001 kernel: [2761871.560276] IN= OUT=eno1 SRC=10.192.0.11 DST=10.192.0.43 LEN=88 TOS=0x00 PREC=0xC0 TTL=64 ID=58542 PROTO=ICMP TYPE=5 CODE=1 GATEWAY=10.192.0.1 [SRC=10.192.0.43 DST=10.192.64.229 LEN=60 TOS=0x00 PREC=0x00 TTL=62 ID=42189 DF PROTO=TCP SPT=50234 DPT=8192 WINDOW=29200 RES=0x00 SYN URGP=0 ] 
Jun 21 10:27:02 kubernetes2001 kernel: [2761871.565558] IN= OUT=eno1 SRC=10.192.0.11 DST=10.192.0.45 LEN=88 TOS=0x00 PREC=0xC0 TTL=64 ID=9172 PROTO=ICMP TYPE=5 CODE=1 GATEWAY=10.192.0.1 [SRC=10.192.0.45 DST=10.192.64.185 LEN=60 TOS=0x00 PREC=0x00 TTL=62 ID=11634 DF PROTO=TCP SPT=42678 DPT=8192 WINDOW=29200 RES=0x00 SYN URGP=0 ] 
Jun 21 10:27:02 kubernetes2001 kernel: [2761871.570002] IN= OUT=eno1 SRC=10.192.0.11 DST=10.192.0.81 LEN=88 TOS=0x00 PREC=0xC0 TTL=64 ID=49701 PROTO=ICMP TYPE=5 CODE=1 GATEWAY=10.192.0.1 [SRC=10.192.0.81 DST=10.192.64.89 LEN=60 TOS=0x00 PREC=0x00 TTL=62 ID=56826 DF PROTO=TCP SPT=43526 DPT=8192 WINDOW=29200 RES=0x00 SYN URGP=0 ] 
Jun 21 10:27:02 kubernetes2001 kernel: [2761871.594882] IN= OUT=eno1 SRC=10.192.0.11 DST=10.192.0.47 LEN=88 TOS=0x00 PREC=0xC0 TTL=64 ID=59837 PROTO=ICMP TYPE=5 CODE=1 GATEWAY=10.192.0.1 [SRC=10.192.0.47 DST=10.192.64.187 LEN=60 TOS=0x00 PREC=0x00 TTL=62 ID=36299 DF PROTO=TCP SPT=60108 DPT=8192 WINDOW=29200 RES=0x00 SYN URGP=0 ] 
Jun 21 10:27:02 kubernetes2001 kernel: [2761871.749193] IN= OUT=eno1 SRC=10.192.0.11 DST=10.192.0.41 LEN=88 TOS=0x00 PREC=0xC0 TTL=64 ID=38157 PROTO=ICMP TYPE=5 CODE=1 GATEWAY=10.192.0.1 [SRC=10.192.0.41 DST=10.192.64.185 LEN=60 TOS=0x00 PREC=0x00 TTL=62 ID=57642 DF PROTO=TCP SPT=45976 DPT=8192 WINDOW=29200 RES=0x00 SYN URGP=0 ] 
Jun 21 10:27:10 kubernetes2001 kernel: [2761879.364248] IN= OUT=eno1 SRC=10.192.0.11 DST=10.192.0.82 LEN=88 TOS=0x00 PREC=0xC0 TTL=64 ID=40232 PROTO=ICMP TYPE=5 CODE=1 GATEWAY=10.192.0.1 [SRC=10.192.0.82 DST=10.192.64.213 LEN=60 TOS=0x00 PREC=0x00 TTL=62 ID=64972 DF PROTO=TCP SPT=35918 DPT=8192 WINDOW=29200 RES=0x00 SYN URGP=0 ] 
Jun 21 10:27:10 kubernetes2001 kernel: [2761879.399809] IN= OUT=eno1 SRC=10.192.0.11 DST=10.192.0.70 LEN=88 TOS=0x00 PREC=0xC0 TTL=64 ID=40392 PROTO=ICMP TYPE=5 CODE=1 GATEWAY=10.192.0.1 [SRC=10.192.0.70 DST=10.192.64.158 LEN=60 TOS=0x00 PREC=0x00 TTL=62 ID=48014 DF PROTO=TCP SPT=40612 DPT=8192 WINDOW=29200 RES=0x00 SYN URGP=0 ] 
Jun 21 10:27:10 kubernetes2001 kernel: [2761879.441312] IN= OUT=eno1 SRC=10.192.0.11 DST=10.192.0.47 LEN=88 TOS=0x00 PREC=0xC0 TTL=64 ID=60115 PROTO=ICMP TYPE=5 CODE=1 GATEWAY=10.192.0.1 [SRC=10.192.0.47 DST=10.192.64.212 LEN=60 TOS=0x00 PREC=0x00 TTL=62 ID=36078 DF PROTO=TCP SPT=60148 DPT=8192 WINDOW=29200 RES=0x00 SYN URGP=0 ] 
Jun 21 10:27:10 kubernetes2001 kernel: [2761879.607858] IN= OUT=eno1 SRC=10.192.0.11 DST=10.192.0.44 LEN=88 TOS=0x00 PREC=0xC0 TTL=64 ID=52813 PROTO=ICMP TYPE=5 CODE=1 GATEWAY=10.192.0.1 [SRC=10.192.0.44 DST=10.192.64.148 LEN=60 TOS=0x00 PREC=0x00 TTL=62 ID=48133 DF PROTO=TCP SPT=51150 DPT=8192 WINDOW=29200 RES=0x00 SYN URGP=0 ] 
Jun 21 10:27:10 kubernetes2001 kernel: [2761879.875426] IN= OUT=eno1 SRC=10.192.0.11 DST=10.192.0.42 LEN=88 TOS=0x00 PREC=0xC0 TTL=64 ID=16304 PROTO=ICMP TYPE=5 CODE=1 GATEWAY=10.192.0.1 [SRC=10.192.0.42 DST=10.192.64.65 LEN=60 TOS=0x00 PREC=0x00 TTL=62 ID=54650 DF PROTO=TCP SPT=48284 DPT=8192 WINDOW=29200 RES=0x00 SYN URGP=0 ] 
Jun 21 10:27:13 kubernetes2001 kernel: [2761882.150220] IN= OUT=eno1 SRC=10.192.0.11 DST=10.192.0.44 LEN=88 TOS=0x00 PREC=0xC0 TTL=64 ID=52914 PROTO=ICMP TYPE=5 CODE=1 GATEWAY=10.192.0.1 [SRC=10.192.0.44 DST=10.192.64.158 LEN=60 TOS=0x00 PREC=0x00 TTL=62 ID=2289 DF PROTO=TCP SPT=51178 DPT=8192 WINDOW=29200 RES=0x00 SYN URGP=0 ] 
Jun 21 10:27:13 kubernetes2001 kernel: [2761882.157302] IN= OUT=eno1 SRC=10.192.0.11 DST=10.192.0.80 LEN=88 TOS=0x00 PREC=0xC0 TTL=64 ID=247 PROTO=ICMP TYPE=5 CODE=1 GATEWAY=10.192.0.1 [SRC=10.192.0.80 DST=10.192.64.185 LEN=60 TOS=0x00 PREC=0x00 TTL=62 ID=41598 DF PROTO=TCP SPT=53098 DPT=8192 WINDOW=29200 RES=0x00 SYN URGP=0 ] 
Jun 21 10:27:13 kubernetes2001 kernel: [2761882.588703] IN= OUT=eno1 SRC=10.192.0.11 DST=10.192.0.51 LEN=88 TOS=0x00 PREC=0xC0 TTL=64 ID=29217 PROTO=ICMP TYPE=5 CODE=1 GATEWAY=10.192.0.1 [SRC=10.192.0.51 DST=10.192.64.158 LEN=60 TOS=0x00 PREC=0x00 TTL=62 ID=18064 DF PROTO=TCP SPT=43812 DPT=8192 WINDOW=29200 RES=0x00 SYN URGP=0 ] 
Jun 21 10:27:14 kubernetes2001 kernel: [2761883.210573] IN= OUT=eno1 SRC=10.192.0.11 DST=10.192.0.80 LEN=88 TOS=0x00 PREC=0xC0 TTL=64 ID=352 PROTO=ICMP TYPE=5 CODE=1 GATEWAY=10.192.0.1 [SRC=10.192.0.80 DST=10.192.64.148 LEN=60 TOS=0x00 PREC=0x00 TTL=62 ID=36291 DF PROTO=TCP SPT=53116 DPT=8192 WINDOW=29200 RES=0x00 SYN URGP=0 ] 
Jun 21 10:27:14 kubernetes2001 kernel: [2761883.245982] IN= OUT=eno1 SRC=10.192.0.11 DST=10.192.0.43 LEN=88 TOS=0x00 PREC=0xC0 TTL=64 ID=60423 PROTO=ICMP TYPE=5 CODE=1 GATEWAY=10.192.0.1 [SRC=10.192.0.43 DST=10.192.64.204 LEN=60 TOS=0x00 PREC=0x00 TTL=62 ID=25463 DF PROTO=TCP SPT=50326 DPT=8192 WINDOW=29200 RES=0x00 SYN URGP=0 ] 
Jun 21 10:27:14 kubernetes2001 kernel: [2761883.253288] IN= OUT=eno1 SRC=10.192.0.11 DST=10.192.0.42 LEN=88 TOS=0x00 PREC=0xC0 TTL=64 ID=16922 PROTO=ICMP TYPE=5 CODE=1 GATEWAY=10.192.0.1 [SRC=10.192.0.42 DST=10.192.64.76 LEN=60 TOS=0x00 PREC=0x00 TTL=62 ID=9742 DF PROTO=TCP SPT=48330 DPT=8192 WINDOW=29200 RES=0x00 SYN URGP=0 ] 
Jun 21 10:27:19 kubernetes2001 kernel: [2761888.885171] IN= OUT=eno1 SRC=10.192.0.11 DST=10.192.0.64 LEN=88 TOS=0x00 PREC=0xC0 TTL=64 ID=1282 PROTO=ICMP TYPE=5 CODE=1 GATEWAY=10.192.0.1 [SRC=10.192.0.64 DST=10.192.64.184 LEN=60 TOS=0x00 PREC=0x00 TTL=62 ID=4111 DF PROTO=TCP SPT=36120 DPT=8192 WINDOW=29200 RES=0x00 SYN URGP=0 ] 
Jun 21 10:27:20 kubernetes2001 kernel: [2761889.705702] IN= OUT=eno1 SRC=10.192.0.11 DST=10.192.0.46 LEN=88 TOS=0x00 PREC=0xC0 TTL=64 ID=65091 PROTO=ICMP TYPE=5 CODE=1 GATEWAY=10.192.0.1 [SRC=10.192.0.46 DST=10.192.64.185 LEN=60 TOS=0x00 PREC=0x00 TTL=62 ID=37439 DF PROTO=TCP SPT=37820 DPT=8192 WINDOW=29200 RES=0x00 SYN URGP=0 ] 
Jun 21 10:27:20 kubernetes2001 kernel: [2761889.749241] IN= OUT=eno1 SRC=10.192.0.11 DST=10.192.0.42 LEN=88 TOS=0x00 PREC=0xC0 TTL=64 ID=18543 PROTO=ICMP TYPE=5 CODE=1 GATEWAY=10.192.0.1 [SRC=10.192.0.42 DST=10.192.64.184 LEN=60 TOS=0x00 PREC=0x00 TTL=62 ID=59182 DF PROTO=TCP SPT=48382 DPT=8192 WINDOW=29200 RES=0x00 SYN URGP=0 ] 
Jun 21 10:27:20 kubernetes2001 kernel: [2761889.782961] IN= OUT=eno1 SRC=10.192.0.11 DST=10.192.0.40 LEN=88 TOS=0x00 PREC=0xC0 TTL=64 ID=17194 PROTO=ICMP TYPE=5 CODE=1 GATEWAY=10.192.0.1 [SRC=10.192.0.40 DST=10.192.64.184 LEN=60 TOS=0x00 PREC=0x00 TTL=62 ID=17238 DF PROTO=TCP SPT=59902 DPT=8192 WINDOW=29200 RES=0x00 SYN URGP=0 ] 
Jun 21 10:27:23 kubernetes2001 kernel: [2761892.185248] IN= OUT=eno1 SRC=10.192.0.11 DST=10.192.0.46 LEN=88 TOS=0x00 PREC=0xC0 TTL=64 ID=65216 PROTO=ICMP TYPE=5 CODE=1 GATEWAY=10.192.0.1 [SRC=10.192.0.46 DST=10.192.64.89 LEN=60 TOS=0x00 PREC=0x00 TTL=62 ID=39502 DF PROTO=TCP SPT=37836 DPT=8192 WINDOW=29200 RES=0x00 SYN URGP=0 ] 
Jun 21 10:27:23 kubernetes2001 kernel: [2761892.249947] IN= OUT=eno1 SRC=10.192.0.11 DST=10.192.0.81 LEN=88 TOS=0x00 PREC=0xC0 TTL=64 ID=50191 PROTO=ICMP TYPE=5 CODE=1 GATEWAY=10.192.0.1 [SRC=10.192.0.81 DST=10.192.64.89 LEN=60 TOS=0x00 PREC=0x00 TTL=62 ID=12936 DF PROTO=TCP SPT=43720 DPT=8192 WINDOW=29200 RES=0x00 SYN URGP=0 ] 
Jun 21 10:27:23 kubernetes2001 kernel: [2761892.675693] IN= OUT=eno1 SRC=10.192.0.11 DST=10.192.0.42 LEN=88 TOS=0x00 PREC=0xC0 TTL=64 ID=18635 PROTO=ICMP TYPE=5 CODE=1 GATEWAY=10.192.0.1 [SRC=10.192.0.42 DST=10.192.64.89 LEN=60 TOS=0x00 PREC=0x00 TTL=62 ID=14956 DF PROTO=TCP SPT=48410 DPT=8192 WINDOW=29200 RES=0x00 SYN URGP=0 ] 
Jun 21 10:27:23 kubernetes2001 kernel: [2761892.712914] IN= OUT=eno1 SRC=10.192.0.11 DST=10.192.0.40 LEN=88 TOS=0x00 PREC=0xC0 TTL=64 ID=17432 PROTO=ICMP TYPE=5 CODE=1 GATEWAY=10.192.0.1 [SRC=10.192.0.40 DST=10.192.64.185 LEN=60 TOS=0x00 PREC=0x00 TTL=62 ID=2476 DF PROTO=TCP SPT=59944 DPT=8192 WINDOW=29200 RES=0x00 SYN URGP=0 ] 
Jun 21 10:27:25 kubernetes2001 kernel: [2761894.281220] IN= OUT=eno1 SRC=10.192.0.11 DST=10.192.0.47 LEN=88 TOS=0x00 PREC=0xC0 TTL=64 ID=63082 PROTO=ICMP TYPE=5 CODE=1 GATEWAY=10.192.0.1 [SRC=10.192.0.47 DST=10.192.64.76 LEN=60 TOS=0x00 PREC=0x00 TTL=62 ID=36252 DF PROTO=TCP SPT=60270 DPT=8192 WINDOW=29200 RES=0x00 SYN URGP=0 ] 
Jun 21 10:27:25 kubernetes2001 kernel: [2761894.290333] IN= OUT=eno1 SRC=10.192.0.11 DST=10.192.0.81 LEN=88 TOS=0x00 PREC=0xC0 TTL=64 ID=50270 PROTO=ICMP TYPE=5 CODE=1 GATEWAY=10.192.0.1 [SRC=10.192.0.81 DST=10.192.64.95 LEN=60 TOS=0x00 PREC=0x00 TTL=62 ID=54216 DF PROTO=TCP SPT=43748 DPT=8192 WINDOW=29200 RES=0x00 SYN URGP=0 ] 
Jun 21 10:27:25 kubernetes2001 kernel: [2761894.299006] IN= OUT=eno1 SRC=10.192.0.11 DST=10.192.0.44 LEN=88 TOS=0x00 PREC=0xC0 TTL=64 ID=55929 PROTO=ICMP TYPE=5 CODE=1 GATEWAY=10.192.0.1 [SRC=10.192.0.44 DST=10.192.64.65 LEN=60 TOS=0x00 PREC=0x00 TTL=62 ID=44996 DF PROTO=TCP SPT=51256 DPT=8192 WINDOW=29200 RES=0x00 SYN URGP=0 ] 
Jun 21 10:27:33 kubernetes2001 kernel: [2761902.971222] IN= OUT=eno1 SRC=10.192.0.11 DST=10.192.0.41 LEN=88 TOS=0x00 PREC=0xC0 TTL=64 ID=43717 PROTO=ICMP TYPE=5 CODE=1 GATEWAY=10.192.0.1 [SRC=10.192.0.41 DST=10.192.64.204 LEN=60 TOS=0x00 PREC=0x00 TTL=62 ID=26352 DF PROTO=TCP SPT=46168 DPT=8192 WINDOW=29200 RES=0x00 SYN URGP=0 ] 
Jun 21 10:27:33 kubernetes2001 kernel: [2761902.991718] IN= OUT=eno1 SRC=10.192.0.11 DST=10.192.0.43 LEN=88 TOS=0x00 PREC=0xC0 TTL=64 ID=64867 PROTO=ICMP TYPE=5 CODE=1 GATEWAY=10.192.0.1 [SRC=10.192.0.43 DST=10.192.64.89 LEN=60 TOS=0x00 PREC=0x00 TTL=62 ID=47011 DF PROTO=TCP SPT=50460 DPT=8192 WINDOW=29200 RES=0x00 SYN URGP=0 ] 
Jun 21 10:27:34 kubernetes2001 kernel: [2761903.022789] IN= OUT=eno1 SRC=10.192.0.11 DST=10.192.0.58 LEN=88 TOS=0x00 PREC=0xC0 TTL=64 ID=22793 PROTO=ICMP TYPE=5 CODE=1 GATEWAY=10.192.0.1 [SRC=10.192.0.58 DST=10.192.64.187 LEN=60 TOS=0x00 PREC=0x00 TTL=62 ID=29401 DF PROTO=TCP SPT=53326 DPT=8192 WINDOW=29200 RES=0x00 SYN URGP=0 ] 
Jun 21 10:27:34 kubernetes2001 kernel: [2761903.798746] IN= OUT=eno1 SRC=10.192.0.11 DST=10.192.0.47 LEN=88 TOS=0x00 PREC=0xC0 TTL=64 ID=64714 PROTO=ICMP TYPE=5 CODE=1 GATEWAY=10.192.0.1 [SRC=10.192.0.47 DST=10.192.64.184 LEN=60 TOS=0x00 PREC=0x00 TTL=62 ID=51073 DF PROTO=TCP SPT=60318 DPT=8192 WINDOW=29200 RES=0x00 SYN URGP=0 ] 
Jun 21 10:27:34 kubernetes2001 kernel: [2761903.803706] IN= OUT=eno1 SRC=10.192.0.11 DST=10.192.0.44 LEN=88 TOS=0x00 PREC=0xC0 TTL=64 ID=56136 PROTO=ICMP TYPE=5 CODE=1 GATEWAY=10.192.0.1 [SRC=10.192.0.44 DST=10.192.64.187 LEN=60 TOS=0x00 PREC=0x00 TTL=62 ID=41143 DF PROTO=TCP SPT=51312 DPT=8192 WINDOW=29200 RES=0x00 SYN URGP=0 ] 
Jun 21 10:27:34 kubernetes2001 kernel: [2761903.809925] IN= OUT=eno1 SRC=10.192.0.11 DST=10.192.0.71 LEN=88 TOS=0x00 PREC=0xC0 TTL=64 ID=65405 PROTO=ICMP TYPE=5 CODE=1 GATEWAY=10.192.0.1 [SRC=10.192.0.71 DST=10.192.64.148 LEN=60 TOS=0x00 PREC=0x00 TTL=62 ID=43373 DF PROTO=TCP SPT=40498 DPT=8192 WINDOW=29200 RES=0x00 SYN URGP=0 ] 
Jun 21 10:27:35 kubernetes2001 kernel: [2761904.058425] IN= OUT=eno1 SRC=10.192.0.11 DST=10.192.0.48 LEN=88 TOS=0x00 PREC=0xC0 TTL=64 ID=65387 PROTO=ICMP TYPE=5 CODE=1 GATEWAY=10.192.0.1 [SRC=10.192.0.48 DST=10.192.64.158 LEN=60 TOS=0x00 PREC=0x00 TTL=62 ID=27781 DF PROTO=TCP SPT=37656 DPT=8192 WINDOW=29200 RES=0x00 SYN URGP=0 ] 
Jun 21 10:27:35 kubernetes2001 kernel: [2761904.066650] IN= OUT=eno1 SRC=10.192.0.11 DST=10.192.0.71 LEN=88 TOS=0x00 PREC=0xC0 TTL=64 ID=65415 PROTO=ICMP TYPE=5 CODE=1 GATEWAY=10.192.0.1 [SRC=10.192.0.71 DST=10.192.64.141 LEN=60 TOS=0x00 PREC=0x00 TTL=62 ID=45677 DF PROTO=TCP SPT=40504 DPT=8192 WINDOW=29200 RES=0x00 SYN URGP=0 ] 
Jun 21 10:27:36 kubernetes2001 kernel: [2761905.028732] IN= OUT=eno1 SRC=10.192.0.11 DST=10.192.0.80 LEN=88 TOS=0x00 PREC=0xC0 TTL=64 ID=1897 PROTO=ICMP TYPE=5 CODE=1 GATEWAY=10.192.0.1 [SRC=10.192.0.80 DST=10.192.64.203 LEN=60 TOS=0x00 PREC=0x00 TTL=62 ID=27248 DF PROTO=TCP SPT=53290 DPT=8192 WINDOW=29200 RES=0x00 SYN URGP=0 ] 
Jun 21 10:27:36 kubernetes2001 kernel: [2761905.039031] IN= OUT=eno1 SRC=10.192.0.11 DST=10.192.0.46 LEN=88 TOS=0x00 PREC=0xC0 TTL=64 ID=237 PROTO=ICMP TYPE=5 CODE=1 GATEWAY=10.192.0.1 [SRC=10.192.0.46 DST=10.192.64.148 LEN=60 TOS=0x00 PREC=0x00 TTL=62 ID=3810 DF PROTO=TCP SPT=37926 DPT=8192 WINDOW=29200 RES=0x00 SYN URGP=0 ] 
Jun 21 10:27:36 kubernetes2001 kernel: [2761905.044723] IN= OUT=eno1 SRC=10.192.0.11 DST=10.192.0.41 LEN=88 TOS=0x00 PREC=0xC0 TTL=64 ID=44124 PROTO=ICMP TYPE=5 CODE=1 GATEWAY=10.192.0.1 [SRC=10.192.0.41 DST=10.192.64.188 LEN=60 TOS=0x00 PREC=0x00 TTL=62 ID=50376 DF PROTO=TCP SPT=46216 DPT=8192 WINDOW=29200 RES=0x00 SYN URGP=0 ]

Using dropwatch I get

akosiaris@kubernetes2001:~$ sudo dropwatch -l kas
Initalizing kallsyms db
dropwatch> start
Enabling monitoring...
Kernel monitoring activated.
Issue Ctrl-C to stop monitoring
1 drops at nf_hook_slow+99 (0xffffffffb0f4ba89)
1 drops at nf_hook_slow+99 (0xffffffffb0f4ba89)
2 drops at nf_hook_slow+99 (0xffffffffb0f4ba89)

So it looks like it's netfilter dropping those ICMP redirects

Merging in as in P8652

Trying to figure out what the hell is the reason those icmp redirects get discarded https://grafana.wikimedia.org/d/PRA2F67Zz/t226237?orgId=1

Adding a trace

sudo iptables -t raw -I OUTPUT 1 -p icmp  --icmp-type redirect -j TRACE

Kernel logs:

Jun 25 09:14:42 kubernetes2001 ulogd[33179]: TRACE: raw:OUTPUT:policy:2  IN= OUT=eno1 MAC= SRC=10.192.0.11 DST=10.192.0.42 LEN=88 TOS=00 PREC=0xC0 TTL=64 ID=32296 PROTO=ICMP TYPE=5 CODE=1 MARK=0
Jun 25 09:14:42 kubernetes2001 ulogd[33179]: TRACE: mangle:OUTPUT:policy:1  IN= OUT=eno1 MAC= SRC=10.192.0.11 DST=10.192.0.42 LEN=88 TOS=00 PREC=0xC0 TTL=64 ID=32296 PROTO=ICMP TYPE=5 CODE=1 MARK=0
Jun 25 09:14:42 kubernetes2001 ulogd[33179]: TRACE: raw:OUTPUT:policy:2  IN= OUT=eno1 MAC= SRC=10.192.0.11 DST=10.192.0.43 LEN=88 TOS=00 PREC=0xC0 TTL=64 ID=53386 PROTO=ICMP TYPE=5 CODE=1 MARK=0
Jun 25 09:14:42 kubernetes2001 ulogd[33179]: TRACE: mangle:OUTPUT:policy:1  IN= OUT=eno1 MAC= SRC=10.192.0.11 DST=10.192.0.43 LEN=88 TOS=00 PREC=0xC0 TTL=64 ID=53386 PROTO=ICMP TYPE=5 CODE=1 MARK=0
Jun 25 09:14:42 kubernetes2001 ulogd[33179]: TRACE: raw:OUTPUT:policy:2  IN= OUT=eno1 MAC= SRC=10.192.0.11 DST=10.192.0.47 LEN=80 TOS=00 PREC=0xC0 TTL=64 ID=17607 PROTO=ICMP TYPE=5 CODE=1 MARK=0
Jun 25 09:14:42 kubernetes2001 ulogd[33179]: TRACE: mangle:OUTPUT:policy:1  IN= OUT=eno1 MAC= SRC=10.192.0.11 DST=10.192.0.47 LEN=80 TOS=00 PREC=0xC0 TTL=64 ID=17607 PROTO=ICMP TYPE=5 CODE=1 MARK=0
Jun 25 09:14:42 kubernetes2001 ulogd[33179]: TRACE: raw:OUTPUT:policy:2  IN= OUT=eno1 MAC= SRC=10.192.0.11 DST=10.192.0.61 LEN=88 TOS=00 PREC=0xC0 TTL=64 ID=63723 PROTO=ICMP TYPE=5 CODE=1 MARK=0
Jun 25 09:14:42 kubernetes2001 ulogd[33179]: TRACE: mangle:OUTPUT:policy:1  IN= OUT=eno1 MAC= SRC=10.192.0.11 DST=10.192.0.61 LEN=88 TOS=00 PREC=0xC0 TTL=64 ID=63723 PROTO=ICMP TYPE=5 CODE=1 MARK=0

But why on earth this never reaches nat or filter tables?

raw OUTPUT doesn't have much in it

sudo iptables -t raw -nvxL OUTPUT
Chain OUTPUT (policy ACCEPT 1504 packets, 489837 bytes)
    pkts      bytes target     prot opt in     out     source               destination         
akosiaris@kubernetes2001:~$

mangle is empty

akosiaris@kubernetes2001:~$ sudo iptables -t mangle -nvxL 
Chain PREROUTING (policy ACCEPT 4915506 packets, 1194368467 bytes)
    pkts      bytes target     prot opt in     out     source               destination         

Chain INPUT (policy ACCEPT 411915 packets, 391957445 bytes)
    pkts      bytes target     prot opt in     out     source               destination         

Chain FORWARD (policy ACCEPT 4503591 packets, 802411022 bytes)
    pkts      bytes target     prot opt in     out     source               destination         

Chain OUTPUT (policy ACCEPT 421129 packets, 144900211 bytes)
    pkts      bytes target     prot opt in     out     source               destination         

Chain POSTROUTING (policy ACCEPT 4847121 packets, 940326433 bytes)
    pkts      bytes target     prot opt in     out     source               destination         
akosiaris@kubernetes2001:~$

https://upload.wikimedia.org/wikipedia/commons/3/37/Netfilter-packet-flow.svg only lists a reroute check there, but that would not block a packet, would it?

Using perf record also leads to the same conclusion as dropwatch for where the packets are dropped/discarded.

$ sudo perf record -g -a -e skb:kfree_skb
$ sudo perf script
swapper     0 [000] 3102101.053096: skb:kfree_skb: skbaddr=0xffff89015793cc00 protocol=2048 location=0xffffffffb0f4ba89
            7fffb0efef2f kfree_skb+0x80004f60206f ([kernel.kallsyms])
            7fffb0f4da89 nf_hook_slow+0x80004f602099 ([kernel.kallsyms])
            7fffb0efef2f kfree_skb+0x80004f60206f ([kernel.kallsyms])
            7fffb0f4da89 nf_hook_slow+0x80004f602099 ([kernel.kallsyms])
            7fffb0f5a343 __ip_local_out+0x80004f6020e3 ([kernel.kallsyms])
            7fffb0f582e0 dst_output+0x80004f602000 ([kernel.kallsyms])
            7fffb0f5a3c7 ip_local_out+0x80004f602017 ([kernel.kallsyms])
            7fffb0f5b5c5 ip_send_skb+0x80004f602015 ([kernel.kallsyms])
            7fffb0f8bd5b __icmp_send+0x80004f60246b ([kernel.kallsyms])
            7fffb0f53958 ip_rt_send_redirect+0x80004f6021c8 ([kernel.kallsyms])
            7fffb0f56d0f ip_forward+0x80004f60246f ([kernel.kallsyms])
            7fffb0f5493b ip_rcv_finish+0x80004f6020ab ([kernel.kallsyms])
            7fffb0f55304 ip_rcv+0x80004f602294 ([kernel.kallsyms])
            7fffb0f54890 ip_rcv_finish+0x80004f602000 ([kernel.kallsyms])
            7fffb0f1343d __netif_receive_skb_core+0x80004f60251d ([kernel.kallsyms])
            7fffb0beb30c kmem_cache_alloc+0x80004f60211c ([kernel.kallsyms])
            7fffb0f911a8 inet_gro_receive+0x80004f6021f8 ([kernel.kallsyms])
            7fffb0f139df netif_receive_skb_internal+0x80004f60202f ([kernel.kallsyms])
            7fffb0f147d8 napi_gro_receive+0x80004f6020b8 ([kernel.kallsyms])
            7fffc02b6d38 tg3_poll_work+0x800040002a58 ([kernel.kallsyms])
            7fffc02b722a tg3_poll_msix+0x80004000203a ([kernel.kallsyms])
            7fffb0f14226 net_rx_action+0x80004f602246 ([kernel.kallsyms])
            7fffb102039d __do_softirq+0x80004f60210d ([kernel.kallsyms])
            7fffb0a82c62 irq_exit+0x80004f6020c2 ([kernel.kallsyms])
            7fffb101f427 do_IRQ+0x80004f602057 ([kernel.kallsyms])
            7fffb101d196 ret_from_intr+0x80004f602000 ([kernel.kallsyms])
            7fffb0ede8c2 cpuidle_enter_state+0x80004f6020a2 ([kernel.kallsyms])
            7fffb0abfed4 cpu_startup_entry+0x80004f602154 ([kernel.kallsyms])
            7fffb1740f5e start_kernel+0x80004f602447 ([kernel.kallsyms])
            7fffb1740120 early_idt_handler_common+0x80004f602000 ([kernel.kallsyms])
            7fffb1740408 x86_64_start_kernel+0x80004f60214c ([kernel.kallsyms])

Mentioned in SAL (#wikimedia-operations) [2019-06-25T12:27:12Z] <akosiaris> fully depool kubernetes2001 T226237

Change 618239 had a related patch set uploaded (by Alexandros Kosiaris; owner: Alexandros Kosiaris):
[operations/puppet@production] kubernetes: Stop sending ICMP redirects

https://gerrit.wikimedia.org/r/618239

Change 618239 merged by Alexandros Kosiaris:
[operations/puppet@production] kubernetes: Stop sending ICMP redirects

https://gerrit.wikimedia.org/r/618239

akosiaris changed the task status from Open to Stalled.Aug 4 2020, 9:21 AM
akosiaris moved this task from Unused 🥌 to Incoming 🐫 on the serviceops board.

Stalled in the hope that I 'll have some time in the next few weeks to git into the kernel source again and figure out why those ICMP redirects get discarded.