The kube-controllers of calico 3.17.1 fail to reach the apiserver for quite some time after being started
2021-01-07 13:29:49.792 [INFO][1] main.go 88: Loaded configuration from environment config=&config.Config{LogLevel:"info", WorkloadEndpointWorkers:1, ProfileWorkers:1, PolicyWorkers:1, NodeWorkers:1, Kubeconfig:"", DatastoreType:"kubernetes"} W0107 13:29:49.796530 1 client_config.go:543] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work. 2021-01-07 13:29:49.800 [INFO][1] main.go 109: Ensuring Calico datastore is initialized 2021-01-07 13:29:49.839 [INFO][1] main.go 149: Getting initial config snapshot from datastore 2021-01-07 13:29:49.863 [INFO][1] main.go 152: Got initial config snapshot 2021-01-07 13:29:49.864 [INFO][1] watchersyncer.go 89: Start called 2021-01-07 13:29:49.864 [INFO][1] main.go 169: Starting status report routine 2021-01-07 13:29:49.865 [INFO][1] main.go 402: Starting controller ControllerType="Node" 2021-01-07 13:29:49.865 [INFO][1] node_controller.go 138: Starting Node controller 2021-01-07 13:29:49.865 [INFO][1] watchersyncer.go 127: Sending status update Status=wait-for-ready 2021-01-07 13:29:49.865 [INFO][1] node_syncer.go 40: Node controller syncer status updated: wait-for-ready 2021-01-07 13:29:49.865 [INFO][1] watchersyncer.go 147: Starting main event processing loop 2021-01-07 13:29:49.865 [INFO][1] watchercache.go 174: Full resync is required ListRoot="/calico/resources/v3/projectcalico.org/nodes" 2021-01-07 13:29:49.866 [INFO][1] resources.go 349: Main client watcher loop 2021-01-07 13:29:49.878 [INFO][1] watchercache.go 271: Sending synced update ListRoot="/calico/resources/v3/projectcalico.org/nodes" 2021-01-07 13:29:49.878 [INFO][1] watchersyncer.go 127: Sending status update Status=resync 2021-01-07 13:29:49.878 [INFO][1] node_syncer.go 40: Node controller syncer status updated: resync 2021-01-07 13:29:49.878 [INFO][1] watchersyncer.go 209: Received InSync event from one of the watcher caches 2021-01-07 13:29:49.878 [INFO][1] watchersyncer.go 221: All watchers have sync'd data - sending data and final sync 2021-01-07 13:29:49.878 [INFO][1] watchersyncer.go 127: Sending status update Status=in-sync 2021-01-07 13:29:49.878 [INFO][1] node_syncer.go 40: Node controller syncer status updated: in-sync 2021-01-07 13:29:49.889 [INFO][1] hostendpoints.go 90: successfully synced all hostendpoints 2021-01-07 13:29:49.965 [INFO][1] node_controller.go 151: Node controller is now running 2021-01-07 13:29:49.965 [INFO][1] ipam.go 45: Synchronizing IPAM data 2021-01-07 13:30:09.885 [ERROR][1] client.go 261: Error getting cluster information config ClusterInformation="default" error=Get "https://kubestagemaster.svc.codfw.wmnet:6443/a pis/crd.projectcalico.org/v1/clusterinformations/default": context deadline exceeded 2021-01-07 13:30:09.885 [ERROR][1] main.go 207: Failed to verify datastore error=Get "https://kubestagemaster.svc.codfw.wmnet:6443/apis/crd.projectcalico.org/v1/clusterinformati ons/default": context deadline exceeded 2021-01-07 13:30:41.885 [ERROR][1] main.go 238: Failed to reach apiserver error=Get "https://kubestagemaster.svc.codfw.wmnet:6443/apis/crd.projectcalico.org/v1/clusterinformatio ns/default": context deadline exceeded ... last 3 lines repeating every ~20sec ... 2021-01-07 13:44:53.919 [ERROR][1] client.go 261: Error getting cluster information config ClusterInformation="default" error=Get "https://kubestagemaster.svc.codfw.wmnet:6443/apis/crd.projectcalico.org/v1/clusterinformations/default": context deadline exceeded 2021-01-07 13:44:53.920 [ERROR][1] main.go 207: Failed to verify datastore error=Get "https://kubestagemaster.svc.codfw.wmnet:6443/apis/crd.projectcalico.org/v1/clusterinformations/default": context deadline exceeded 2021-01-07 13:45:25.920 [ERROR][1] main.go 238: Failed to reach apiserver error=Get "https://kubestagemaster.svc.codfw.wmnet:6443/apis/crd.projectcalico.org/v1/clusterinformations/default": context deadline exceeded W0107 13:45:29.196341 1 reflector.go:326] pkg/mod/k8s.io/client-go@v0.17.2/tools/cache/reflector.go:105: watch of *v1.Node ended with: an error on the server ("unable to decode an event from the watch stream: read tcp [2620:0:860:babe:9763:6d0c:9ec5:a6]:38386->[2620:0:860:104:10:192:48:10]:6443: read: connection timed out") has prevented the request from succeeding 2021-01-07 13:45:29.196 [INFO][1] ipam.go 45: Synchronizing IPAM data 2021-01-07 13:45:29.196 [INFO][1] resources.go 377: Terminating main client watcher loop 2021-01-07 13:45:29.196 [ERROR][1] runconfig.go 212: error watching KubeControllersConfiguration error=<nil> 2021-01-07 13:45:29.196 [INFO][1] watchercache.go 121: Watch error received from Upstream ListRoot="/calico/resources/v3/projectcalico.org/nodes" error=an error on the server ("unable to decode an event from the watch stream: read tcp [2620:0:860:babe:9763:6d0c:9ec5:a6]:38386->[2620:0:860:104:10:192:48:10]:6443: read: connection timed out") has prevented the request from succeeding 2021-01-07 13:45:29.196 [INFO][1] watchercache.go 174: Full resync is required ListRoot="/calico/resources/v3/projectcalico.org/nodes" 2021-01-07 13:45:29.546 [INFO][1] ipam.go 191: Node and IPAM data is in sync 2021-01-07 13:45:30.222 [INFO][1] resources.go 349: Main client watcher loop
After that, everything is fine and the kube-controllers work as expected.
Logs are from docker-registry.wikimedia.org/calico/kube-controllers:v3.17.1 running on kubestage2002.codfw.wmnet
This looks very similar to:
- https://github.com/projectcalico/calico/issues/3751
- https://github.com/projectcalico/libcalico-go/issues/1267
But I don't get which change may have caused this (as there are only two between 3.17.0 and 3.17.1:
https://github.com/projectcalico/kube-controllers/compare/v3.17.0...v3.17.1
With 3.17.0 I'm unable to reproduce this.