Page MenuHomePhabricator

Toolforge: iptables flavor for Debian Buster-based k8s cluster
Closed, ResolvedPublic

Description

If you deploy a worker or master node without any special config into Debian Buster, you get a mixture of iptables-legacy and iptables-nft rules.
While we didn't detect any special issue yet, this can make things a bit more complex for no reason.

To check what is going on, some examples.

This host had someone using iptables-nft (docker and k8s apparently). Mind that iptables-nft rulesets can be inspected with nft list ruleset:

root@toolsbeta-test-k8s-worker-1:~# nft list ruleset
table ip filter {
	chain INPUT {
		type filter hook input priority 0; policy accept;
		counter packets 14710 bytes 6562935 jump KUBE-FIREWALL
	}

	chain FORWARD {
		type filter hook forward priority 0; policy drop;
		counter packets 0 bytes 0 jump DOCKER-USER
		counter packets 0 bytes 0 jump DOCKER-ISOLATION-STAGE-1
		oifname "docker0" ct state related,established counter packets 0 bytes 0 accept
		oifname "docker0" counter packets 0 bytes 0 jump DOCKER
		iifname "docker0" oifname != "docker0" counter packets 0 bytes 0 accept
		iifname "docker0" oifname "docker0" counter packets 0 bytes 0 accept
	}

	chain OUTPUT {
		type filter hook output priority 0; policy accept;
		counter packets 14672 bytes 3565219 jump KUBE-FIREWALL
	}

	chain DOCKER {
	}

	chain DOCKER-ISOLATION-STAGE-1 {
		iifname "docker0" oifname != "docker0" counter packets 0 bytes 0 jump DOCKER-ISOLATION-STAGE-2
		counter packets 0 bytes 0 return
	}

	chain DOCKER-ISOLATION-STAGE-2 {
		oifname "docker0" counter packets 0 bytes 0 drop
		counter packets 0 bytes 0 return
	}

	chain DOCKER-USER {
		counter packets 0 bytes 0 return
	}

	chain KUBE-FIREWALL {
[...]

In the same host, calico is using iptables-legacy directly:

root@toolsbeta-test-k8s-worker-1:~# iptables-legacy-save 
# Generated by iptables-save v1.8.2 on Wed Jul 17 10:45:24 2019
*raw
:PREROUTING ACCEPT [15159:6197518]
:OUTPUT ACCEPT [15094:3697571]
:cali-OUTPUT - [0:0]
:cali-PREROUTING - [0:0]
:cali-failsafe-in - [0:0]
:cali-failsafe-out - [0:0]
:cali-from-host-endpoint - [0:0]
:cali-to-host-endpoint - [0:0]
-A PREROUTING -m comment --comment "cali:6gwbT8clXdHdC1b1" -j cali-PREROUTING
-A OUTPUT -m comment --comment "cali:tVnHkvAo15HuiPy0" -j cali-OUTPUT
-A cali-OUTPUT -m comment --comment "cali:njdnLwYeGqBJyMxW" -j MARK --set-xmark 0x0/0xf0000
-A cali-OUTPUT -m comment --comment "cali:rz86uTUcEZAfFsh7" -j cali-to-host-endpoint
-A cali-OUTPUT -m comment --comment "cali:pN0F5zD0b8yf9W1Z" -m mark --mark 0x10000/0x10000 -j ACCEPT
-A cali-PREROUTING -m comment --comment "cali:XFX5xbM8B9qR10JG" -j MARK --set-xmark 0x0/0xf0000
-A cali-PREROUTING -i cali+ -m comment --comment "cali:EWMPb0zVROM-woQp" -j MARK --set-xmark 0x40000/0x40000
-A cali-PREROUTING -m comment --comment "cali:Ek_rsNpunyDlK3sH" -m mark --mark 0x0/0x40000 -j cali-from-host-endpoint
-A cali-PREROUTING -m comment --comment "cali:nM-DzTFPwQbQvtRj" -m mark --mark 0x10000/0x10000 -j ACCEPT
-A cali-failsafe-in -p tcp -m comment --comment "cali:wWFQM43tJU7wwnFZ" -m multiport --dports 22 -j ACCEPT
-A cali-failsafe-in -p udp -m comment --comment "cali:LwNV--R8MjeUYacw" -m multiport --dports 68 -j ACCEPT
-A cali-failsafe-in -p tcp -m comment --comment "cali:QOO5NUOqOSS1_Iw0" -m multiport --dports 179 -j ACCEPT
-A cali-failsafe-in -p tcp -m comment --comment "cali:cwZWoBSwVeIAZmVN" -m multiport --dports 2379 -j ACCEPT
-A cali-failsafe-in -p tcp -m comment --comment "cali:7FbNXT91kugE_upR" -m multiport --dports 2380 -j ACCEPT
-A cali-failsafe-in -p tcp -m comment --comment "cali:ywE9WYUBEpve70WT" -m multiport --dports 6666 -j ACCEPT
-A cali-failsafe-in -p tcp -m comment --comment "cali:l-WQSVBf_lygPR0J" -m multiport --dports 6667 -j ACCEPT
-A cali-failsafe-in -p udp -m comment --comment "cali:k9jPBsnz833bYNtN" -m multiport --sports 53 -j ACCEPT
-A cali-failsafe-in -p udp -m comment --comment "cali:h6bDkHXiHjFdQFvi" -m multiport --sports 67 -j ACCEPT
-A cali-failsafe-in -p tcp -m comment --comment "cali:ZxyjJQRmKuKXDHob" -m multiport --sports 179 -j ACCEPT
[...]

We have two options:

  • switch everything to iptables-nft (the default in Debian Buster [0])
  • switch everything to iptables-legacy
using iptables-nft

This would be my preferred approach. But for this we need to make sure that the 3 key components (docker, calico, kube-proxy) can all work together.

  1. I just found that calico has a config switch to select which iptables backend we want to use (nft or legacy), and defaults to legacy in order to support older containers [1]

Unfortunately, the calico-node pod harcodes iptables-legacy [6].

  1. Docker briefly hardcoded iptables-legacy in their source code, but that was reverted already, and will use the system default [2]
  2. kube-proxy seems to works fine with both flavors of iptables [3], but still some people report some issues when using Debian Buster.
  3. kubelet had some issues with iptables before 1.8.3 [4]. This will require an iptables backport that will be done shortly [5].

So using this would require we switch calico to iptables-nft by means of the config file, plus making extra sure that kube-proxy and docker are all happy as well.
This may mean a bit more work to double-check that everything is working as expected. Also, this is way more future-proof, since future debian releases may lean even more towards nftables.

Special mention to calico-node hardcoding iptables-legacy [6], which may force us to use iptables-legacy everywhere.

using iptables-legacy

The main reason to use iptables-legacy seems to be related to supporting pods running older binaries (so binaries/libs in the container match those in the base OS). I don't think we have this use case in Toolforge.
But anyway this path should be even easier to follow, since:

  1. kube-proxy uses the system binary apparently (which we control by means of update-alternatives)
  2. docker uses the system binary (which we control by means of update-alternatives)
  3. calico is using by default iptables-legacy, so no special config required here

This means that with a single update-alternatives call per alternative everything should be on the same page:

# update-alternatives --set iptables /usr/sbin/iptables-legacy
# update-alternatives --set ip6tables /usr/sbin/ip6tables-legacy
# update-alternatives --set arptables /usr/sbin/arptables-legacy
# update-alternatives --set ebtables /usr/sbin/ebtables-legacy

[0] https://wiki.debian.org/nftables
[1] https://docs.projectcalico.org/v3.8/reference/resources/felixconfig#spec
[2] https://github.com/docker/libnetwork/pull/2343/files
[3] https://github.com/kubernetes/kubernetes/issues/71305
[4] https://github.com/kubernetes/kubernetes/issues/79304
[5] https://tracker.debian.org/pkg/iptables
[6] https://github.com/projectcalico/node/commit/12b7bfbabaaef26fa75041ca160e2f8c4792d7e0

Details

Related Gerrit Patches:

Event Timeline

aborrero triaged this task as Medium priority.Jul 17 2019, 11:05 AM
aborrero created this task.
aborrero updated the task description. (Show Details)Jul 17 2019, 11:31 AM
aborrero updated the task description. (Show Details)Jul 17 2019, 12:43 PM

Change 524175 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/puppet@production] toolforge: k8s: introduce calico workaround for the iptables backend in buster

https://gerrit.wikimedia.org/r/524175

Mentioned in SAL (#wikimedia-cloud) [2019-07-18T09:28:55Z] <arturo> re-create toolsbeta-test-k8s-master-{1,2,3} as buster to test T228267

Change 524175 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] toolforge: k8s: introduce calico workaround for the iptables backend in buster

https://gerrit.wikimedia.org/r/524175

aborrero closed this task as Resolved.Jul 18 2019, 11:58 AM

This is apparently solved.

Will probably need to revisit this in future Debian/calico/kubeadm/k8s releases.