Page MenuHomePhabricator

CoreDNS in the new k8s cluster cannot talk to the Cloud recursors
Closed, ResolvedPublic

Description

The new cluster deployment in toolsbeta times out trying to query DNS via the recursors. This is blocking progress on the Kubernetes upgrade goal.

Here's a sample from the logs.

2019-10-30T17:46:56.406Z [ERROR] plugin/errors: 2 3169860986298190214.3171083079977244622. HINFO: read udp 192.168.132.130:60866->208.80.154.143:53: i/o timeout
2019-10-30T17:46:59.406Z [ERROR] plugin/errors: 2 3169860986298190214.3171083079977244622. HINFO: read udp 192.168.132.130:58982->208.80.154.24:53: i/o timeout
2019-10-30T17:47:01.407Z [ERROR] plugin/errors: 2 3169860986298190214.3171083079977244622. HINFO: read udp 192.168.132.130:51183->208.80.154.24:53: i/o timeout
2019-10-30T17:47:02.407Z [ERROR] plugin/errors: 2 3169860986298190214.3171083079977244622. HINFO: read udp 192.168.132.130:54120->208.80.154.143:53: i/o timeout
2019-10-30T17:47:04.407Z [ERROR] plugin/errors: 2 3169860986298190214.3171083079977244622. HINFO: read udp 192.168.132.130:32950->208.80.154.143:53: i/o timeout
2019-10-30T17:47:07.408Z [ERROR] plugin/errors: 2 3169860986298190214.3171083079977244622. HINFO: read udp 192.168.132.130:43714->208.80.154.24:53: i/o timeout
2019-10-30T17:47:10.408Z [ERROR] plugin/errors: 2 3169860986298190214.3171083079977244622. HINFO: read udp 192.168.132.130:38917->208.80.154.143:53: i/o timeout
2019-10-30T17:47:13.408Z [ERROR] plugin/errors: 2 3169860986298190214.3171083079977244622. HINFO: read udp 192.168.132.130:33238->208.80.154.143:53: i/o timeout
2019-10-30T17:47:16.409Z [ERROR] plugin/errors: 2 3169860986298190214.3171083079977244622. HINFO: read udp 192.168.132.130:35531->208.80.154.143:53: i/o timeout
2019-10-30T17:47:19.409Z [ERROR] plugin/errors: 2 3169860986298190214.3171083079977244622. HINFO: read udp 192.168.132.130:36707->208.80.154.24:53: i/o timeout

Event Timeline

Bstorm triaged this task as High priority.Wed, Nov 6, 3:25 PM
Bstorm created this task.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptWed, Nov 6, 3:25 PM
Andrew added a subscriber: Andrew.Wed, Nov 6, 3:40 PM

This seems to be because pods have originating ips in the 192.168.0.0/16 range, which the pdns recursors were not expecting. I can add that range as permitted (which should be harmless) unless we want to rethink these origination IPs entirely.

Bstorm closed this task as Resolved.Wed, Nov 6, 4:15 PM
Bstorm claimed this task.

Turns out it was a need to reboot the nodes! It was an iptables routing thing.