CoreDNS in the new k8s cluster cannot talk to the Cloud recursors
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	• Bstorm
	Nov 6 2019, 3:25 PM

Description

The new cluster deployment in toolsbeta times out trying to query DNS via the recursors. This is blocking progress on the Kubernetes upgrade goal.

Here's a sample from the logs.

2019-10-30T17:46:56.406Z [ERROR] plugin/errors: 2 3169860986298190214.3171083079977244622. HINFO: read udp 192.168.132.130:60866->208.80.154.143:53: i/o timeout
2019-10-30T17:46:59.406Z [ERROR] plugin/errors: 2 3169860986298190214.3171083079977244622. HINFO: read udp 192.168.132.130:58982->208.80.154.24:53: i/o timeout
2019-10-30T17:47:01.407Z [ERROR] plugin/errors: 2 3169860986298190214.3171083079977244622. HINFO: read udp 192.168.132.130:51183->208.80.154.24:53: i/o timeout
2019-10-30T17:47:02.407Z [ERROR] plugin/errors: 2 3169860986298190214.3171083079977244622. HINFO: read udp 192.168.132.130:54120->208.80.154.143:53: i/o timeout
2019-10-30T17:47:04.407Z [ERROR] plugin/errors: 2 3169860986298190214.3171083079977244622. HINFO: read udp 192.168.132.130:32950->208.80.154.143:53: i/o timeout
2019-10-30T17:47:07.408Z [ERROR] plugin/errors: 2 3169860986298190214.3171083079977244622. HINFO: read udp 192.168.132.130:43714->208.80.154.24:53: i/o timeout
2019-10-30T17:47:10.408Z [ERROR] plugin/errors: 2 3169860986298190214.3171083079977244622. HINFO: read udp 192.168.132.130:38917->208.80.154.143:53: i/o timeout
2019-10-30T17:47:13.408Z [ERROR] plugin/errors: 2 3169860986298190214.3171083079977244622. HINFO: read udp 192.168.132.130:33238->208.80.154.143:53: i/o timeout
2019-10-30T17:47:16.409Z [ERROR] plugin/errors: 2 3169860986298190214.3171083079977244622. HINFO: read udp 192.168.132.130:35531->208.80.154.143:53: i/o timeout
2019-10-30T17:47:19.409Z [ERROR] plugin/errors: 2 3169860986298190214.3171083079977244622. HINFO: read udp 192.168.132.130:36707->208.80.154.24:53: i/o timeout

Related Objects
Search...

Status	Assigned	Task
		Restricted Task
Resolved	• Bstorm	T246122 Upgrade the Toolforge Kubernetes cluster to v1.16
		Restricted Task
Resolved	bd808	T232536 Toolforge Kubernetes internal API down, causing `webservice` and other tooling to fail
Resolved	• Bstorm	T236565 "tools" Cloud VPS project jessie deprecation
Resolved	aborrero	T101651 Set up toolsbeta more fully to help make testing easier
Resolved	• Bstorm	T166949 Homedir/UID info breaks after a while in Tools Kubernetes (can't read replica.my.cnf)
Resolved	• Bstorm	T246059 Add admin account creation to maintain-kubeusers
Resolved	• Bstorm	T154504 Make webservice backend default to kubernetes
Declined	None	T245230 Investigate cpu/ram requests and limits for DaemonSets pods
Resolved	• Bstorm	T214513 Deploy and migrate tools to a Kubernetes v1.15 or newer cluster
Resolved	aborrero	T215531 Deploy upgraded Kubernetes to toolsbeta
Resolved	• Bstorm	T237541 CoreDNS in the new k8s cluster cannot talk to the Cloud recursors

Event Timeline

• Bstorm triaged this task as High priority.Nov 6 2019, 3:25 PM

• Bstorm created this task.

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptNov 6 2019, 3:25 PM

This seems to be because pods have originating ips in the 192.168.0.0/16 range, which the pdns recursors were not expecting. I can add that range as permitted (which should be harmless) unless we want to rethink these origination IPs entirely.

Turns out it was a need to reboot the nodes! It was an iptables routing thing.

• Bstorm added a parent task: T215531: Deploy upgraded Kubernetes to toolsbeta.Nov 6 2019, 4:18 PM

CoreDNS in the new k8s cluster cannot talk to the Cloud recursorsClosed, ResolvedPublicActions

Description

Related ObjectsSearch...

Event Timeline

CoreDNS in the new k8s cluster cannot talk to the Cloud recursors
Closed, ResolvedPublic
Actions

Related Objects
Search...