Page MenuHomePhabricator

PAWS Kubernetes cluster forgot how to forward packets between nodes
Closed, DuplicatePublic

Description

Sometime today, suddenly the paws k8s cluster had a total networking meltdown. Pods couldn't talk to each other or the outside internet. Hosts could talk to the pods that were running on them but nobody else. After a lot of digging, the following command, run on all hosts, worked: sudo iptables -P FORWARD ACCEPT. It came from https://github.com/kubernetes/kubernetes/issues/40182, but shouldn't have affected us since we have been running Docker 1.13 pretty much from the beginning on these hosts. Nothing seems to have triggered this 'stop working' phase, and rebooting didn't fix it either. Now after running this command, rebooting doesn't seem to make a difference - node still works fine. The appropriate sysctl has always been on.

No idea what happened!

Event Timeline

First notification that it was broken came on irc at 2017-10-03T21:34. From the last activity report it seems likely that things were working within at least an hour or two of that time.

on a personal note, at some point in the debugging I was ready to give up and let PAWS die - thankfully with some encouragement from @bd808 and @chasemp it did not happen. However, without stronger institutional support, I'm unsure how much longer PAWS can survive. @bd808, @chasemp and @madhuvishy need more help & resources if PAWS is to continue to thrive.

yuvipanda renamed this task from Mysterious iptables rule suddenly required to keep networking inside pods working to PAWS outage of unknown cause.Oct 4 2017, 12:49 AM
yuvipanda added a subscriber: Halfak.
bd808 renamed this task from PAWS outage of unknown cause to PAWS Kubernetes cluster forgot how to forward packets between nodes.Oct 6 2017, 4:26 PM