Page MenuHomePhabricator

CoreDNS in the new k8s cluster cannot talk to the Cloud recursors
Closed, ResolvedPublic

Description

The new cluster deployment in toolsbeta times out trying to query DNS via the recursors. This is blocking progress on the Kubernetes upgrade goal.

Here's a sample from the logs.

2019-10-30T17:46:56.406Z [ERROR] plugin/errors: 2 3169860986298190214.3171083079977244622. HINFO: read udp 192.168.132.130:60866->208.80.154.143:53: i/o timeout
2019-10-30T17:46:59.406Z [ERROR] plugin/errors: 2 3169860986298190214.3171083079977244622. HINFO: read udp 192.168.132.130:58982->208.80.154.24:53: i/o timeout
2019-10-30T17:47:01.407Z [ERROR] plugin/errors: 2 3169860986298190214.3171083079977244622. HINFO: read udp 192.168.132.130:51183->208.80.154.24:53: i/o timeout
2019-10-30T17:47:02.407Z [ERROR] plugin/errors: 2 3169860986298190214.3171083079977244622. HINFO: read udp 192.168.132.130:54120->208.80.154.143:53: i/o timeout
2019-10-30T17:47:04.407Z [ERROR] plugin/errors: 2 3169860986298190214.3171083079977244622. HINFO: read udp 192.168.132.130:32950->208.80.154.143:53: i/o timeout
2019-10-30T17:47:07.408Z [ERROR] plugin/errors: 2 3169860986298190214.3171083079977244622. HINFO: read udp 192.168.132.130:43714->208.80.154.24:53: i/o timeout
2019-10-30T17:47:10.408Z [ERROR] plugin/errors: 2 3169860986298190214.3171083079977244622. HINFO: read udp 192.168.132.130:38917->208.80.154.143:53: i/o timeout
2019-10-30T17:47:13.408Z [ERROR] plugin/errors: 2 3169860986298190214.3171083079977244622. HINFO: read udp 192.168.132.130:33238->208.80.154.143:53: i/o timeout
2019-10-30T17:47:16.409Z [ERROR] plugin/errors: 2 3169860986298190214.3171083079977244622. HINFO: read udp 192.168.132.130:35531->208.80.154.143:53: i/o timeout
2019-10-30T17:47:19.409Z [ERROR] plugin/errors: 2 3169860986298190214.3171083079977244622. HINFO: read udp 192.168.132.130:36707->208.80.154.24:53: i/o timeout

Event Timeline

Bstorm created this task.

This seems to be because pods have originating ips in the 192.168.0.0/16 range, which the pdns recursors were not expecting. I can add that range as permitted (which should be harmless) unless we want to rethink these origination IPs entirely.

Bstorm claimed this task.

Turns out it was a need to reboot the nodes! It was an iptables routing thing.